Powering On-Premises and Hybrid AI Workloads with Iceberg Data Products

Data products enhancements for on-premises and hybrid data architectures

August 12, 2025

Monica Miller

Senior Product Manager

Starburst

Evan Smith

Technical Content Manager

Starburst Data

Monica Miller

Senior Product Manager

Starburst

Evan Smith

Technical Content Manager

Starburst Data

More deployment options

Request Enterprise trial license key →

Start for Free with Starburst Galaxy

Try our free trial today and see how you can improve your data performance.

Start Free

What Data Teams Can Learn About Readiness

At Starburst, we believe that all workloads—including AI workloads—should be run on an Icehouse architecture, powered by Apache Iceberg and Trino. The Icehouse delivers an end-to-end, frictionless lakehouse experience. For this reason, Icehouse architecture is central to our mission to enhance data access, collaboration, and governance.

Starburst cloud customers are already familiar with Apache Iceberg workflow components, including Iceberg data products, Iceberg data maintenance, and Iceberg materialized view refresh. These features allow them to package data into governed, query-ready products for analytics and AI.

Now, we are bringing those same capabilities (and more) to Starburst Enterprise, unlocking improved and exclusive benefits for on-premises and hybrid architectures.

Why choosing your environment matters for on-premises and hybrid users

Expanding our Icehouse capabilities to include on-premises and hybrid environments marks an important evolution for modern data platforms. Many organizations, particularly those operating in regulated sectors, need the flexibility to harness Iceberg’s performance, governance, and interoperability without moving all workloads to the cloud.

Many high-compliance industries, including financial services, insurance, healthcare, and the public sector, operate with on-premises or hybrid deployments. This isn’t just a regulatory requirement, it’s also a choice. These organizations require modern, open data lakehouse capabilities that allow them to maintain strict control over data storage.

By making Iceberg data products available in Starburst Enterprise, we are extending the most powerful features of our cloud platform to these environments, helping teams move faster, govern more effectively, and deliver analytics and AI workloads at scale without compromising on compliance.

What are data products?

Data products are curated, governed datasets packaged for reuse across analytics and AI workloads. A data product combines different data entities with built-in governance, metadata, and access controls, making it easy for teams to discover, trust, and consume data without manual preparation. This approach turns raw or siloed data into well-defined, query-ready assets that can be securely shared across teams and environments.

Want to know more about data products? Check out this video.

Why data products are particularly useful for AI workflows

Data products are particularly valuable for AI workflows. Metadata is a critical component for retrieving accurate results. By helping to capture and curate metadata in a structured and consistent way, data products are the perfect catalyst for AI, which thrives on both metadata and context. Because of this, metadata that includes schema details and governance rules helps AI models understand context, which improves feature engineering, model accuracy, and explainability.

Iceberg workload accelerators

Iceberg data products bring together two powerful ideas in modern data architecture: the flexibility of Apache Iceberg tables and the reusability of governed, curated datasets. Using this approach, building discoverable, accessible, trusted, and curated Iceberg data products is simplified through workload accelerators, such as data maintenance and Iceberg materialized view refresh.

Why Starburst users will benefit from Iceberg data products

Running Iceberg data products in on-premises and hybrid environments enables users to self-serve and discover top relevant insights from their Iceberg data entities.

You can now easily manage these data entities all within the Starburst Enterprise platform, reducing your maintenance overhead. You can also publish data products across clusters, allowing subscribing clusters to access this data (in private preview), aligning perfectly with the Icehouse vision for unified, governed, and high-performance data access.

Importantly, the value is immediate. Iceberg data products don’t require any additional infrastructure configurations compared to older approaches, like Hive. So whether you’re working on a new project or an old one, you can get immediate value and fast, federated access without additional steps.

Overall, this brings the power of Iceberg and the convenience of data products together for on-premises and hybrid users.

Data maintenance, now for on-premises and hybrid environments

With Iceberg data products now available in on-premises and hybrid environments, users can also take advantage of Iceberg’s full suite of data maintenance capabilities. This includes essential operations like compaction, snapshot expiration, orphan file removal, and profiling with statistics.

Together, these features help maintain performance, control storage costs, and keep your lakehouse optimized for analytics and AI.

Iceberg materialized view refresh for on-premises and hybrid users

With Iceberg data products now available in on-premises and hybrid environments, materialized views become far easier to keep current. Automated refreshes replace the need for manual upkeep, ensuring that query results reflect the latest data. The updated Data Product interface provides a clear, streamlined way to configure and oversee these refreshes, helping teams maintain accuracy and performance with minimal complexity.

Data product sharing (Private Preview)

Data products also make it possible to replicate and share governed datasets across clusters. This means that rather than recreating the same data product in multiple environments, teams can distribute an identical, access-controlled version that is ready for use.

This added capability promotes reusability, safeguards data integrity, and supports the strict infrastructure boundaries common in high-compliance, on-premises, and hybrid deployments. Check out the image below for more information.

Image depicting the data architecture used in the Iceberg data products data sharing feature by Starburst.

Interested in trying out data product sharing? Reach out to your account team to learn more about this new and exciting feature.

Why Starburst data products are perfect for on-premises and hybrid workloads

Making Iceberg data products available for on-premises and hybrid environments marks a turning point in how organizations can design and operate governed, high-performance data platforms. The combination of Iceberg’s rich metadata and Starburst’s distributed query capabilities allows these environments to adopt the same scalable, metadata-driven practices that have already proven effective in the cloud, but without moving sensitive workloads.

Supporting AI workflows

For data teams, this means the boundaries between cloud, on-premises, and hybrid architectures matter less when it comes to building a unified, governed analytics and AI platform. This opens up possibilities, particularly for highly-regulated industries. Whether the goal is to streamline complex analytics, or maintain tight compliance, the same curated and reusable data products can now live anywhere and remain accessible under a consistent governance model.

A data architecture built for the future

In the long term, this flexibility will help high-compliance organizations modernize their architectures at their own pace. They can integrate Iceberg’s maintenance, schema evolution, and view management features into environments they already trust, while leveraging Starburst to unify access and performance across all their data. The result is a more adaptable, future-ready data strategy that works across every deployment model.

Start for Free with Starburst Galaxy

Try our free trial today and see how you can improve your data performance.

Start Free

Starburst’s mission is to free our customers to see the invisible and achieve the impossible

Powering On-Premises and Hybrid AI Workloads with Iceberg Data Products

More deployment options

Start for Free with Starburst Galaxy

What Data Teams Can Learn About Readiness

Why choosing your environment matters for on-premises and hybrid users

What are data products?

Why data products are particularly useful for AI workflows

Iceberg workload accelerators

Why Starburst users will benefit from Iceberg data products

Data maintenance, now for on-premises and hybrid environments

Iceberg materialized view refresh for on-premises and hybrid users

Data product sharing (Private Preview)

Why Starburst data products are perfect for on-premises and hybrid workloads

Supporting AI workflows

A data architecture built for the future

Start for Free with Starburst Galaxy