How Apache Iceberg Branching Transforms Data Management

September 9, 2025

Yuya Ebihara

Software Engineer

Starburst

Yuya Ebihara

Software Engineer

Starburst

More deployment options

Request Enterprise trial license key →

Start for Free with Starburst Galaxy

Try our free trial today and see how you can improve your data performance.

Start Free

No One Wins Alone

Data lakehouse versioning and branching allow organizations to manage changes safely and efficiently. These features are extremely popular and are a big factor in the rise of lakehouse technology today.

To understand how Apache Iceberg branching and versioning work, it’s useful to compare them to Git branching in software development. In a word, branching and versioning offer a safety net. Just as developers can experiment in a separate branch without affecting the main codebase, data engineers can create branches of datasets to test transformations or fixes without impacting production data.

Specifically, branching can be applied to several data cases, including:

Experimentation: Run transformations on a branch before merging changes into the main table.
Backfill Jobs: Isolate large data rewrites to avoid impacting production.
Safe Development: Allow multiple teams to work on the same dataset concurrently.
What-if Analysis: Test queries on branched data without affecting the production dataset.

Additionally, branching can also be combined with a Write-Audit-Publish (WAP) pattern, allowing you to safely stage changes in a branch, audit them, and then publish to the main branch, providing a robust workflow for managing complex data updates.

Want to know more? Let’s look at some specific scenarios where Iceberg branching is useful for data workloads.

What is Branching in Apache Iceberg?

In Apache Iceberg, branching operates according to a particular taxonomy. Branches are named references to a table’s state, similar to branches in Git. They allow you to isolate changes, experiment safely, and manage multiple versions of a dataset simultaneously. For example, a branch might be used to compare to CLONE in Snowflake or Databricks Delta Lake tables, without producing a metadata copy. This approach allows workloads to complete extremely quickly, even in the case of large tables.

Branches, Snapshots, and Tags

Notably, branches differ from snapshots and tags:

Snapshots capture the table state at a specific point in time but are immutable.
Tags are fixed pointers to a particular snapshot for reference.

In this sense, Branches are movable references that can evolve as new commits are made, giving you a flexible way to manage table changes over time.

How does this work in practice? Let’s check it out.

Working with Branches

Let’s look at an example of Iceberg branching in practice using Starburst. These capabilities are available immediately in Starburst Galaxy and on Starburst Enterprise release 476-e. Notably, the functionality works alongside existing Starburst access controls.

This example demonstrates how to overwrite an older partition using branching, which is particularly useful for backfill scenarios.

Since Starburst does not support the INSERT OVERWRITE syntax for replacing existing data in a table or partition, we previously had to rely on a MERGE statement without branching.

With the new syntax, however, we can now effectively simulate INSERT OVERWRITE in a much cleaner way by using DELETE, INSERT, and FAST FORWARD statements.

Prepare data

Let’s create a simple table with five partitions:

CREATE TABLE branching (
   data INT,
   part DATE) 
WITH (
   partitioning = ARRAY['part']
);

INSERT INTO branching VALUES 
(10,  DATE '2025-01-01'), 
(20,  DATE '2025-01-02'),
(-30, DATE '2025-01-03'),
(40,  DATE '2025-01-04'),
(50,  DATE '2025-01-05');

How to create a branch

The data for 2025-01-03 appears to be incorrect. Let’s create a new branch to correct it:

CREATE BRANCH dev IN TABLE branching;
SHOW BRANCHES FROM TABLE branching;

Branch

dev

main

DELETE FROM branching @ dev WHERE part = DATE '2025-01-03';
INSERT INTO branching @ dev VALUES (30, DATE '2025-01-03');
SELECT * FROM branching FOR VERSION AS OF 'dev';

data	part
10	2025-01-01
20	2025-01-02
30	2025-01-03
40	2025-01-04
50	2025-01-05

The main branch still returns results from before the DELETE and INSERT statements are executed:

SELECT * FROM branching;
SELECT * FROM branching FOR VERSION AS OF 'main';

data	part
10	2025-01-01
20	2025-01-02
-30	2025-01-03
40	2025-01-04
50	2025-01-05

Updating and committing changes to a branch

The changes haven’t been applied to the main branch yet.

To update the main branch, we can use the ALTER BRANCH … FAST FORWARD statement. Note that this statement will fail if the main branch has changed since the dev branch was created and is no longer its ancestor.

ALTER BRANCH main IN TABLE branching FAST FORWARD TO dev;

Now we can check the fix in the main branch:

SELECT * FROM branching;

data	part
10	2025-01-01
20	2025-01-02
30	*2025-01-03*
40	2025-01-04
50	2025-01-05

Branch cleanup

Dropping stale branches is important to prevent retaining unnecessary data. You can remove a branch from a table using the DROP BRANCH statement:

DROP BRANCH dev IN TABLE branching;
SHOW BRANCHES FROM TABLE branching;

Branch

main

Challenges and future work

While branching in Iceberg is already powerful, there are a few limitations to consider. Currently, features such as catalog-level branching, tagging, replacing or renaming branches, and cherry-picking commits are not supported. Advanced retention policies, including setting min-snapshot-to-keep, max-snapshot-age-ms, or max-ref-age-ms, are also unavailable at this time.

Why Iceberg branching matters more than ever

Branching in Apache Iceberg makes data lakehouses safer, more flexible, and easier to manage. By isolating changes it enables experimentation without risk, simplifies large backfill jobs, and supports collaborative development across teams. It also empowers analysts to run what-if queries without touching production data.

Starburst: The best way to use Iceberg

As part of our ongoing commitment to Iceberg, Starburst is here to help. Our best-in-class query engine is designed to make handling Iceberg workloads easy, scalable, and efficient. Iceberg branching is part of this effort, and one more reason to choose Starburst for all Iceberg workloads.

Start for Free with Starburst Galaxy

Try our free trial today and see how you can improve your data performance.

Start Free

Starburst’s mission is to free our customers to see the invisible and achieve the impossible

How Apache Iceberg Branching Transforms Data Management

More deployment options

Start for Free with Starburst Galaxy

No One Wins Alone

What is Branching in Apache Iceberg?

Branches, Snapshots, and Tags

Working with Branches

Prepare data

How to create a branch

Updating and committing changes to a branch

Branch cleanup

Challenges and future work

Why Iceberg branching matters more than ever

Starburst: The best way to use Iceberg

Start for Free with Starburst Galaxy