🚀 Watch Launch Point On-Demand: Explore the latest Starburst innovations powering next-gen data apps and AI.

Starburst integration with Amazon S3 Tables

  • Yuya Ebihara

    Yuya Ebihara

    Software Engineer

    Starburst

  • Lester Martin

    Lester Martin

    Developer Adocate

    Starburst

Starburst, the data platform for analytics, apps, and AI, has once again joined forces with Amazon Web Services (AWS). This time, we are thrilled to announce that Starburst Galaxy now seamlessly supports Amazon S3 Tables. 

This is great news for several reasons. For Starburst users, it allows even more optionality. Starburst believes in choice, and this move brings yet more choice to your data architecture. For AWS, it expands access beyond native AWS services. 

Excited? Read more. We’ll show you why this matters, what it means for Iceberg and Starburst, and how to use this new feature in Starburst Galaxy.

 

Amazon S3 Tables + Apache Iceberg + Starburst 

Amazon S3 Tables offer a REST endpoint that allows Starburst to integrate with the new table format. This integration not only allows you to query and modify Amazon S3 Tables using Apache Iceberg but also federates access to this data alongside data from any other data source. With this, Amazon S3 Tables joins over 50 other data sources accessed via Starburst. Once accessed, this data can be used to power workflows that support analytics, data applications, or AI/ML use cases. 

What are Amazon S3 Tables?

Amazon S3 Tables were introduced by AWS at the end of 2024. They operate as a new kind of bucket type for AWS S3. You can think of them as a managed Apache Iceberg table hosting offering. Working with Amazon S3 Tables and table buckets provides more information directly from AWS, as well as instructions on how to create this new object storage bucket type. Additionally, Amazon S3 Tables automatically address table maintenance activities, including table compaction.

Starburst and Iceberg in production

Starburst and Iceberg have a long history together. Apache Iceberg is the foundation of the Starburst Icehouse architecture, and the table format of choice for our compute engine. In production, Starburst clusters use the Iceberg connector to access Iceberg tables. This includes storing Iceberg metadata and data files on S3 buckets and integrating with a variety of megastores, including Iceberg REST catalogs. 

What makes Amazon S3 Tables different? Amazon S3 Tables feature their own mechanisms for controlling catalog access and addressing security. This can impact how organizations implement data governance, enforce data access policies, and integrate with existing security frameworks.

Let’s look at this in more detail. 

The importance of Iceberg table maintenance

Iceberg requires table maintenance activities such as compaction, snapshot expiration, and orphaned file removal. Starburst has already automated this effort with new features that assist with data maintenance. What makes Amazon S3 Tables rather unique is that these maintenance tasks are handled automatically by AWS.  

This means users can access Amazon S3 Tables just like any other Iceberg table. This includes executing federated queries against them with any other configured data source from our extensive list of connectors. Starburst’s integration with Iceberg, coupled with a REST catalog interface from AWS, ensures a seamless fit between these two technologies. 

 

How to connect Starburst to Amazon S3 Tables

Want to get hands-on? One of the best things about this new integration is that you can try it out for yourself. 

Starburst Enterprise’s Iceberg connector already includes instructions showing you how to configure Amazon S3 Tables. Additionally, a joint article between AWS and Starburst details how to build a managed Apache Iceberg data lake using Starburst and Amazon S3 Tables. 

Let’s examine how to integrate Amazon S3 Tables using Starburst Galaxy. These instructions are also included in our Starburst Galaxy documentation.

Note: This is a public preview feature. Contact Starburst support with questions or feedback.

Prerequisites

To complete this configuration, you need access to Starburst Galaxy. Check out our free trial if you are not already set up. You will also need an existing Amazon S3 Table bucket. AWS’ Tutorial: Getting started with S3 Tables provides detailed instructions, if needed.

Step 1 – Prepare Amazon S3 bucket 

Go to the Amazon S3 page in your AWS Console and select Table buckets. As shown in the screenshot below, collect the region, account ID, and bucket name.

Step showing users how to prepare their Amazon S3 bucket when connecting to Starburst.

Each S3 table bucket has a unique “table bucket ARN” that starts with a string that uses the following convention.

arn:aws:s3tables:{REGION}:{ACCOUNT_ID}:bucket/{S3_BUCKET_NAME}

Construct this string based on your specific values. You will need this in the next step.

Step 2 – Set the Amazon S3 Tables catalog

After logging into Starburst Galaxy, navigate to Data and then Catalogs from the menu on the left. Click Create catalog in the newly rendered page.

Image depicting the 2nd step needed when setting up Amazon S3 Tables on Starburst, setting the Amazon S3 Tables catalog.

Click on the Amazon S3 Tables option.

Image showing a user selecting Amazon S3 Tables in Starburst Galaxy.

As detailed in the Starburst documentation, define the catalog Name and description, and complete the Amazon S3 Tables configuration. Use the string created in the prior step for the Table bucket ARN value. Click Test connection, which will present a confirmation message and a new Connect catalog button to click on. 

Step depicting the naming of an Amazon S3 Table in Starburst Galaxy

Continue to leverage the documentation for the values on this new configuration screen before clicking on Set permissions & add to cluster.

Image depicts the Catalog is created screen in Starburst Galaxy after creating Amazon S3 Tables.

Step 3 – Define default schema

S3 Tables do not come with a default schema. In Starburst Galaxy, navigate to the Query editor. Assuming you named your new catalog s3tables, create a schema named example with this SQL statement.

CREATE SCHEMA s3tables.example;

Note: In Starburst, a schema corresponds to a namespace in Amazon S3 Tables.

Step 4 – Create and read a new table

The schema can now be populated with tables. Using the CTAS approach, you can create a new table from any other across all the configured catalogs in your cluster. For this scenario, you can use the Starburst Galaxy sample dataset.

CREATE TABLE s3tables.example.account AS

  SELECT * FROM sample.burstbank.account;

Now, verify the data was inserted into the new table. 

SELECT * FROM s3tables.example.account;

Image depicting the Select * (all) statement from the newly created amazon S3 table.

Note: S3 Table integration supports Iceberg features like time travel queries, schema evolution, and more. However, some maintenance procedures, such as expire_snapshots, are not supported. Instead, these are handled automatically by AWS for your Amazon S3 Tables.

 

Starburst and Amazon S3 Tables: The perfect match

Starburst and Amazon S3 Tables are a natural fit. Compatibility with Starburst Enterprise and Starburst Galaxy builds on our long-standing focus on Apache Iceberg and benefits from AWS automated table maintenance tasks. Starburst is an AWS Data and Analytics and Financial Services Competency Partner and is available via AWS Marketplace.

Take the next step with Starburst

Would you like to automatically maintain Iceberg tables that aren’t persisted with Amazon S3 Tables? This includes other tables backed by normal S3 buckets. Starburst Galaxy offers automated data maintenance across all storage types. 

These maintenance jobs help boost performance and reduce storage usage for Apache Iceberg tables. Supported tasks include data file compaction, statistics collection, and cleanup of outdated snapshots and orphaned files.