
Data lakehouse versioning and branching allow organizations to manage changes safely and efficiently. These features are extremely popular and are a big factor in the rise of lakehouse technology today.
To understand how Apache Iceberg branching and versioning work, it’s useful to compare them to Git branching in software development. In a word, branching and versioning offer a safety net. Just as developers can experiment in a separate branch without affecting the main codebase, data engineers can create branches of datasets to test transformations or fixes without impacting production data.
Specifically, branching can be applied to several data cases, including:
- Experimentation: Run transformations on a branch before merging changes into the main table.
- Backfill Jobs: Isolate large data rewrites to avoid impacting production.
- Safe Development: Allow multiple teams to work on the same dataset concurrently.
- What-if Analysis: Test queries on branched data without affecting the production dataset.
Additionally, branching can also be combined with a Write-Audit-Publish (WAP) pattern, allowing you to safely stage changes in a branch, audit them, and then publish to the main branch, providing a robust workflow for managing complex data updates.
Want to know more? Let’s look at some specific scenarios where Iceberg branching is useful for data workloads.
What is Branching in Apache Iceberg?
In Apache Iceberg, branching operates according to a particular taxonomy. Branches are named references to a table’s state, similar to branches in Git. They allow you to isolate changes, experiment safely, and manage multiple versions of a dataset simultaneously. For example, a branch might be used to compare to CLONE in Snowflake or Databricks Delta Lake tables, without producing a metadata copy. This approach allows workloads to complete extremely quickly, even in the case of large tables.
Branches, Snapshots, and Tags
Notably, branches differ from snapshots and tags:
- Snapshots capture the table state at a specific point in time but are immutable.
- Tags are fixed pointers to a particular snapshot for reference.
In this sense, Branches are movable references that can evolve as new commits are made, giving you a flexible way to manage table changes over time.
How does this work in practice? Let’s check it out.
Working with Branches
Let’s look at an example of Iceberg branching in practice using Starburst. These capabilities are available immediately in Starburst Galaxy and on Starburst Enterprise release 476-e. Notably, the functionality works alongside existing Starburst access controls.
This example demonstrates how to overwrite an older partition using branching, which is particularly useful for backfill scenarios.
Since Starburst does not support the INSERT OVERWRITE syntax for replacing existing data in a table or partition, we previously had to rely on a MERGE statement without branching.
With the new syntax, however, we can now effectively simulate INSERT OVERWRITE in a much cleaner way by using DELETE, INSERT, and FAST FORWARD statements.
Prepare data
Let’s create a simple table with five partitions:
CREATE TABLE branching ( data INT, part DATE) WITH ( partitioning = ARRAY['part'] );
INSERT INTO branching VALUES (10, DATE '2025-01-01'), (20, DATE '2025-01-02'), (-30, DATE '2025-01-03'), (40, DATE '2025-01-04'), (50, DATE '2025-01-05');
How to create a branch
The data for 2025-01-03 appears to be incorrect. Let’s create a new branch to correct it:
CREATE BRANCH dev IN TABLE branching; SHOW BRANCHES FROM TABLE branching;
| Branch |
| dev |
| main |
DELETE FROM branching @ dev WHERE part = DATE '2025-01-03'; INSERT INTO branching @ dev VALUES (30, DATE '2025-01-03'); SELECT * FROM branching FOR VERSION AS OF 'dev';
| data | part |
| 10 | 2025-01-01 |
| 20 | 2025-01-02 |
| 30 | 2025-01-03 |
| 40 | 2025-01-04 |
| 50 | 2025-01-05 |
The main branch still returns results from before the DELETE and INSERT statements are executed:
SELECT * FROM branching; SELECT * FROM branching FOR VERSION AS OF 'main';
| data | part |
| 10 | 2025-01-01 |
| 20 | 2025-01-02 |
| -30 | 2025-01-03 |
| 40 | 2025-01-04 |
| 50 | 2025-01-05 |
Updating and committing changes to a branch
The changes haven’t been applied to the main branch yet.
To update the main branch, we can use the ALTER BRANCH … FAST FORWARD statement. Note that this statement will fail if the main branch has changed since the dev branch was created and is no longer its ancestor.
ALTER BRANCH main IN TABLE branching FAST FORWARD TO dev;
Now we can check the fix in the main branch:
SELECT * FROM branching;
| data | part |
| 10 | 2025-01-01 |
| 20 | 2025-01-02 |
| 30 | 2025-01-03 |
| 40 | 2025-01-04 |
| 50 | 2025-01-05 |
Branch cleanup
Dropping stale branches is important to prevent retaining unnecessary data. You can remove a branch from a table using the DROP BRANCH statement:
DROP BRANCH dev IN TABLE branching; SHOW BRANCHES FROM TABLE branching;
| Branch |
| main |
Challenges and future work
While branching in Iceberg is already powerful, there are a few limitations to consider. Currently, features such as catalog-level branching, tagging, replacing or renaming branches, and cherry-picking commits are not supported. Advanced retention policies, including setting min-snapshot-to-keep, max-snapshot-age-ms, or max-ref-age-ms, are also unavailable at this time.
Why Iceberg branching matters more than ever
Branching in Apache Iceberg makes data lakehouses safer, more flexible, and easier to manage. By isolating changes it enables experimentation without risk, simplifies large backfill jobs, and supports collaborative development across teams. It also empowers analysts to run what-if queries without touching production data.
Starburst: The best way to use Iceberg
As part of our ongoing commitment to Iceberg, Starburst is here to help. Our best-in-class query engine is designed to make handling Iceberg workloads easy, scalable, and efficient. Iceberg branching is part of this effort, and one more reason to choose Starburst for all Iceberg workloads.




