
Understanding an Iceberg snapshot is essential for data engineers working with Apache Iceberg. Each snapshot captures a version of a table, but following how those versions connect can be confusing. This post explains how to interpret linear history in Iceberg metadata and why it matters for managing large, evolving datasets.
For those data engineering practitioners who are mastering their understanding of Apache Iceberg metadata, the is_current_ancestor
flag associated with each snapshot ID can be confusing at first. That flag is marked as true if the snapshot is part of the “linear history” of the current table. Through working examples, let’s do a deep dive into the concept of linear history with Iceberg metadata.
Prerequisites
This post assumes the reader has an existing conceptual understanding of transactional operations and related metadata for the Apache Iceberg modern table format. Please review the initial post in this advanced blog series on Iceberg metadata, which starts with Exploring Iceberg transactions and metadata.
Environmental setup
To perform the steps presented, you will need a running Trino cluster that has been configured with the Iceberg connector set to persist tables on an object store. One approach to meet these requirements would be to register for a Starburst Galaxy free trial and then configure it to access AWS S3 as detailed in the Starburst Galaxy: Getting Started tutorial.
Test it all out
Let’s walk through a variety of operations and visually explore the snapshot hierarchy as we go.
Operation 1; create a table
Use appropriate values formycatalog.myschema
, then create an Iceberg table and query it afterwards.
CREATE mycatalog.myschema; -- or use an existing schema
USE mycatalog.myschema;
CREATE TABLE test_rb (
f1 int, f2 varchar
) WITH (type='iceberg');
SELECT * FROM test_rb;
As expected, there are no rows yet.
As you can see below, I’m joining three different metadata tables to obtain the desired results. To make it easier, I’m also hiding this query behind a view for future ease of use.
CREATE VIEW curr_ver_dets AS
SELECT concat_ws(' > ', r.name, r.type) AS curr_ver,
date_format(s.committed_at, '%Y/%m/%d-%T') AS committed_at,
s.snapshot_id, s.parent_id, h.is_current_ancestor, s.operation
FROM "test_rb$snapshots" AS s
JOIN "test_rb$history" AS h
ON (s.snapshot_id = h.snapshot_id)
LEFT JOIN "test_rb$refs" AS r
ON (h.snapshot_id = r.snapshot_id)
ORDER BY s.committed_at;
Now, and in future steps, you can just read from the new curr_ver_dets
view.
SELECT * FROM curr_ver_dets;
Each change to the structure or content of an Iceberg table creates a new version (aka snapshot). Notice there is only one snapshot at this point, and that it is the initial version (identified by curr_ver
reporting main > BRANCH
), as it has no parent snapshot. And yep, is_current_ancestor
reports true. That means it is part of the linear history of the active snapshot.
Pretty obvious with just one snapshot, and the image below is a bit of overkill at this point. Note that I’m only putting the last 4 numbers of snapshot_id
in the diagram and will continue that strategy with future snapshots.
Operations 2-3; two insert statements
Run two separate INSERT
statements that will create a new snapshot for each operation.
INSERT INTO test_rb (f1, f2)
VALUES
(1, '1st insert (2nd txn)'),
(2, '1st insert (2nd txn)');
INSERT INTO test_rb (f1, f2)
VALUES
(3, '2nd insert (3rd txn)'),
(4, '2nd insert (3rd txn)'),
(5, '2nd insert (3rd txn)');
SELECT * FROM test_rb ORDER BY f1;
We have 5 records now. What does the list of snapshots look like?
SELECT * FROM curr_ver_dets;
Two more snapshots were created, and the final one is the current version. You can trace its parent_id
to the previous row and then again from that middle row to the initial snapshot.
The diagram below shows that all 3 snapshots are part of the “current ancestor tree” (i.e. their is_current_ancestor
flags are set to true).
Operation 4; delete statement
Delete any record whose f1
value is an even number, and verify that only the odd-numbered records are left.
DELETE FROM test_rb
WHERE mod(f1, 2) = 0;
We are down to just 3 records now, and the contents of the curr_ver_dets
view show that we continue to build a very linear set of snapshots whose parent is the snapshot that was created before it. Notice the is_current_ancestor
values are all reporting true.
Operation 5; rollback to operation 2
Rollback to the snapshot_id
that was created on the first INSERT
statement in Operation 2. In my example, the snapshop_id
needed is 5611312366620665402
. Change this to match the snapshot_id
that was created in your environment. Be sure to use appropriate values for mycatalog
and myschema
, too.
CALL mycatalog.system.rollback_to_snapshot(
'myschema', 'test_rb',
5611312366620665402);
Query test_rb
to verify we only have the first 2 records that were inserted in the transaction right after the table creation one.
Query curr_ver_dets
to see that we now have the current version of the table pointing to the snapshot_id
we rolled backed to.
The diagram below illustrates the results presented above. No new snapshot was created, but the current version is 2 versions back, and the last 2 snapshots are no longer part of the linear history of the active version (indicated by the green shapes). These snapshots still exist, and you can roll back to them at any time as well.
Operations 6-7; two more inserts
Run 2 more INSERT
statements that collectively add 3 more rows and create 2 more snapshots.
INSERT INTO test_rb (f1, f2)
VALUES
(61, '3rd insert (oper #6)'),
(62, '3rd insert (oper #6)');
INSERT INTO test_rb (f1, f2)
VALUES
(70, '4th insert (7th operation)');
The most recent 2 snapshots are part of the current ancestor tree, and 2 snapshots before them still have their is_current_ancestor
set to false.
Operation 8; rollback to operation 4
Rollback to the snapshot_id
that was created by Operation 4’s DELETE
statement. In my example, the snapshop_id
needed is 6149468753676344494
. Change it to be the snapshot_id
that was created in your environment. Be sure to use appropriate values for mycatalog
and myschema
, too.
CALL webinar2.system.rollback_to_snapshot(
'myschema', 'test_rb',
6149468753676344494);
You should only see those early odd-numbered records.
As you can see above & below, no new snapshots were created, but the current version has been reset, and appropriate changes have been made to is_current_ancestor
values.
Operations 9-10; insert 2 records and change 1 of them
Let’s see if it is all making sense…
INSERT INTO test_rb (f1, f2)
VALUES
(91, 'OPER #09'),
(92, 'OPER #09');
UPDATE test_rb SET f2 = 'OPER #10'
WHERE f1 = 92;
Make sense? If not, scroll down to the bottom of this article for some ways to ask for more help.
Operations 11-14; drill it home!
This all makes sense, too? Feel free to perform one operation at a time and review it at every step.
CALL webinar2.system.rollback_to_snapshot(
'myschema', 'test_rb',
8421281246956311672);
UPDATE test_rb SET f2 = 'update from Op 12';
INSERT INTO test_rb VALUES (113, 'Op 13');
INSERT INTO test_rb (f1, f2)
VALUES
(114, 'OPER # 14'),
(214, 'OPER # 14');
Leverage the community to learn more
If this makes your head hurt AND you can’t figure it out, reach out in one of these ways for more help.
- Starburst forum
- Trino community chat
- Send an email to devcenter@starburst.io
HAPPY ICEBERG SNAPSHOTTING!
Post originally published by Lester Martin as “iceberg snapshot is_current_ancestor_flag (what does it tell us)” at Lester Martin (l11n) Blog.