
Keeping Apache Iceberg tables performing well requires more than just creating them and loading data. As tables grow, every insert, delete, or update generates new versions, and over time, this accumulation can slow queries and increase storage requirements. To prevent these issues, regular table maintenance is crucial. To maintain optimal performance, Apache Iceberg tables require periodic maintenance as more versions are created. Fortunately, Starburst provides automated data maintenance features to help.
This article focuses on what happens with all of that metadata when you perform compaction, roll off old snapshots, and remove any lingering orphaned files.
Starting point
The examples below build upon the exercises explored in Visualizing Iceberg Snapshot Linear History. Querying the table at the end of that post, we see there are 8 records at this point.
SELECT * FROM test_rb ORDER BY f1;
Querying the view below gives us some details about the current snapshot of the test_rb
table.
-- see prior post for the CREATE VIEW statement
SELECT * FROM curr_ver_dets;
More easily visualized with the following graphic.
A peek at the folder in S3 where this table is rooted at shows…
./metadata/
— has 51 files./data/
— has 14 files
Run the following to see how many data files make up the current version.
SELECT file_path, file_format,
record_count, file_size_in_bytes
FROM "test_rb$files";
The output shows just 5 of the 14 files are used in this version (let’s not worry about all those “small files” — that’s a discussion for another time & just a byproduct of our simple tests).
Compact the data
The
optimize
command is used for rewriting the content of the specified table so that it is merged into fewer but larger files. If the table is partitioned, the data compaction acts separately on each partition selected for optimization. This operation improves read performance.https://docs.starburst.io/latest/connector/iceberg.html#optimize
ALTER TABLE test_rb EXECUTE optimize;
After running the query on "test_rb$files"
again, it doesn’t look like much has happened; down from 5 to 4 files. To understand this, we have to take into account that these are TINY little files and Trino runs in parallel.
If you look closely, you’ll see there are only the 8 records shown earlier that are spread across the files, down from 13 previously. This is because the compaction process also performs a merge-on-write operation to eliminate the merge-on-read “delete files”.
It may not sound like we changed the data or the structure (which causes the creation of a new version), but we did. Rewriting the data like this creates yet another version. Running the previous curr_ver_dets
query shows this (only displaying the last two rows).
Here is the updated visualization.
And yes, because we created another version, we have even more files on the data lake. The metadata
subfolder has 60 files now and data
has 18.
Prune old versions
The
expire_snapshots
command removes all snapshots and all related metadata and data files. Regularly expiring snapshots is recommended to delete data files that are no longer needed, and to keep the size of table metadata small. The procedure affects all snapshots that are older than the time period configured with theretention_threshold
parameter.https://docs.starburst.io/latest/connector/iceberg.html#expire-snapshots
Everything before this last version occurred more than a few days ago, so my command below takes that into account to remove all versions except the current one. Normally, you would likely want to keep some prior versions and only drop those you know you can get rid of.
ALTER TABLE test_rb
EXECUTE expire_snapshots(retention_threshold => '3d');
GOOD! The system is suggesting that we are being too aggressive. Well, let’s just be “aggressive” anyway! Run the session override below, and then the expire_snapshots
command again.
SET SESSION mycatalog.expire_snapshots_min_retention = '2d';
We only have a single version now!
Let’s verify we cannot rollback to that version, it references as its parent_id
(notice the operation
value of replace
above).
CALL mycatalog.system.rollback_to_snapshot(
'myschema', 'test_rb',
4056689309859241417);
Remove orphaned files
There are still a large number of data files (11) on the data lake for this table, although our file compaction process has reduced the count to 4, which is the minimum required for the current version. Thankfully, there’s another maintenance activity that can clean up these files that are no longer referenced.
The
remove_orphan_files
command removes all files from a table’s data directory that are not linked from metadata files and that are older than the value ofretention_threshold
parameter. Deleting orphan files from time to time is recommended to keep size of a table’s data directory under control.https://docs.starburst.io/latest/connector/iceberg.html#remove-orphan-files
Like before, we need to be a bit “aggressive” and lower the minimum threshold value.
SET SESSION
webinar2.remove_orphan_files_min_retention = '2d';
ALTER TABLE test_rb
EXECUTE remove_orphan_files(retention_threshold => '2d');
Hooray, the data lake only has the 4 referenced files shown earlier!
Wrapping up
These maintenance activities rebuild many small files into fewer larger ones, get rid of those pesky/interesting “delete files”, and reduce the amount of data lake storage required to maintain so many historical versions. Coupling these maintenance activities along with the inherent abilities that come from leveraging the metadata on the data lake will allow your Iceberg tables to continue to perform at internet-scale!
Post originally published by Lester Martin as “Apache Iceberg Table Maintenance (is_current_ancestor part deux)” at Lester Martin (l11n) Blog.