
The footprint of AI is growing day by day. With this rise comes growth in the foundational technologies that make AI possible. These technologies include Large Language Models (LLMs)—the centerpiece of AI technology—but they also include the foundational components that make LLMs useful, among them RAG architecture and Agentic AI. Collectively, you can think of that as an emerging AI data stack, and that stack is getting bigger.
Let’s explore this in more depth.
Considering the AI data stack
LLMs are central to the AI data stack, but they are far from the only component needed to make AI work in an enterprise setting. Today, a typical GenAI application architecture includes the following three foundations:
- The LLM itself: Providing basic natural language parsing, artifact generation, and general knowledge functions.
- Supplemental contextual data: Extending the value of the LLM by infusing specific data particular to individual use cases.
- Agentic AI Workflows: Allowing AI the opportunity to perform specific tasks and provide value in a business context.
All three of these components matter, but context is emerging as one of the most important—and most easily forgotten—factors in AI success.
Why context matters in the AI data stack
Context matters for one simple reason: it’s the part of the AI data stack that goes beyond the AI model’s basic training. It contains specific, nuanced details about the task at hand, rather than a generic task in the abstract.
Without context, AI models and AI agents speak in generalities. These generalities are alright as a starting point, but they’re insufficient for achieving positive business outcomes. To succeed in an enterprise setting, AI requires specific contextual knowledge, allowing the AI Agents powered by it to move beyond generalities and achieve real results.
So if context matters, how do AI models absorb it?
Retrieving and storing contextual data
Supplying context at runtime requires specialized infrastructure. This is where vector stores come in. LLMs thrive on context. This context is encoded and stored in specialized repositories, which are then connected to other systems. Vector stores have become the de facto storage mechanism for supplying LLM context.
In this article, we’ll look at vector storage and why it’s essential for AI workflows. We’ll also see what makes vector data and vector embeddings challenging to implement at scale and how to create an enterprise architecture that activates all of your company’s contextual data for AI workflows.
Why AI value relies on contextual knowledge
LLMs are probabilistic systems, trained on vast amounts of data. That data is what makes LLM-based AI great at doing what traditional computing systems found difficult, namely absorbing and working with unstructured data like images, videos, and audio.
That’s the first step, but it’s not the whole picture. To succeed in enterprise settings, LLMs require two fundamental things that are difficult to obtain:
The context of your business
Typically, LLMs don’t have access to all relevant information about your business. They certainly have zero access to proprietary internal data, including customer support logs, internal knowledge bases, sales figures, employee benefits, and other proprietary information.
Without that knowledge, the AI Agents they power have gaps in their knowledge. In the absence of context, LLMs may attempt to infer meaning, which may lead to inaccurate or incomplete results. Accessing that data in an appropriate, governed way is one of the biggest challenges facing AI adoption.
Immediate knowledge
It takes years to train an LLM. Because of this, models don’t know what happened two years ago, let alone anything happening in real-time. Meanwhile, many use cases require immediate knowledge updated with streaming data. Without contextual data, this knowledge is missing from your LLM and any AI Agents built on top of it.
How AI systems access and store contextual data
How do GenAI systems access the contextual data they need? Developers typically supply it in one of two ways: through API integrations or by appending additional information to a user’s prompt.
The second approach—embedding additional text—requires storing and retrieving data in a way that satisfies three key technical requirements:
1. Support for high-dimensional data
Contextual data is often high-dimensional data, meaning a large number of attributes or features describe each data point. In many cases, the number of features exceeds the number of actual records. This is especially common with unstructured data—such as text, audio, video, or images—where each item is transformed into a high-dimensional vector representation. A suitable storage system must efficiently manage this complexity and support fast, approximate similarity searches.
2. Low latency
Latency can be a problem for AI applications. LLM queries already introduce some latency, especially when served over the public internet. Adding another layer of delay when querying a secondary data store can degrade the user experience. Therefore, any supporting data infrastructure must deliver sub-second responses to be viable in production-grade GenAI applications.
3. Data collaboration across your organization
To build robust AI solutions, organizations need to access accurate data from across the business. That means the system must work with data in multiple formats and locations, from various departments or platforms.
You can’t meet these requirements with traditional, column-oriented SQL storage or even NoSQL storage. This is where vector embeddings come in.
What are vector embeddings?
Vector embeddings are essential for integrating contextual data into your LLM-based system. They convert high-dimensional data—such as text, images, audio, or other complex inputs—into dense numeric vectors that can be efficiently stored, compared, and retrieved. In this sense, vector embeddings abstract unstructured data into a numeric format to make it easier to compare.
Making unstructured, contextual data more accessible
Vector embeddings are mathematical representations of unstructured data, optimized for search. This transformation captures the semantic meaning of the input, enabling systems to identify similar data points based on meaning rather than exact matches. These embeddings are then stored in vector databases, which specialize in fast, approximate similarity search across high-dimensional spaces.
Let’s break that down further and explore how vector embeddings enable GenAI applications to store and retrieve contextual information effectively.
How vector embeddings assist GenAI models
To generate vector embeddings from a document, the text is first broken into smaller chunks. These might be sentences, paragraphs, or tokenized word groups.
Each chunk is then passed through an embedding model, which has been trained on vast amounts of data. The model converts the chunk into a numerical vector, where each number (or feature) represents a dimension in vector space. These dimensions capture the semantic characteristics of the original text, allowing similar chunks to be located near each other in that space.
Example: Comparing “bicycles” and “unicycles”
Let’s look at an example. Based on this embedding, common concepts end up in a vector space near one another. To simplify things, consider an example in a three-dimensional space.
In the example above, “bicycle” and “unicycle” are close to one another because they’re forms of the same mode of transportation. “Car” is a little further off but still in the vicinity of other forms of transport. By contrast, the cluster of “foods in bread” occupies a different area of vector space.
Understanding how vector embeddings optimize search
Searching a vector store requires finding items close to one another in vector space. However, calculating this is difficult and computationally demanding. This is where vector embeddings shine. Vector embeddings capture the vector’s semantic meaning in a format optimized for similarity search.
The algorithms that help vector embeddings add contextual data to LLMs
To help achieve this, vector embeddings are typically organized based on similarity. This enables operating against semantically relevant clusters, which requires only searching a subset of all embeddings. There are different algorithms for accomplishing this, such as k-nearest neighbors (KNN) and hierarchical navigable small worlds (HNSW).
For example, consider how a search for the question “Can British shorthair cats be green?” would run using KNN.
Examples: KNN, HNSW, and ANN search
Above, you can see different data about different types of cats and stories about cats assigned to different sectors of vector space. The k in k-nearest neighbors refers to the search radius you specify to find similar results. k=1 will give you the strictest result set, whereas k=5 may give you a larger result set containing information about British shorthairs, American Shorthairs, and stories related to the breed.
An algorithm like KNN can be complex to work with because it’s challenging to determine the optimal value for k. KNN also struggles with extremely high-dimensional data. Another algorithm, approximate nearest neighbor (ANN), is the one most often used today in vector search. It returns data that is very close, although usually not the most accurate, to the search query, providing an outstanding balance between accuracy and speed.
Vector stores and the RAG pipeline
Vector stores have been around for a while. They power recommendation engines, search engines, patient similarity search, and other applications involving high-dimensional data. In the age of AI, vector stores have found yet another application for providing context to LLMs using a technique known as retrieval-augmented generation, or RAG.
In RAG, instead of providing a user’s prompt raw to the LLM, you can first use it to perform a similarity search on a vector store containing your proprietary data, production information, customer service logs, etc. You can then filter and supply the data most relevant to the user’s query as context in the prompt. This results in more accurate, appropriate, and up-to-date results from the LLM.
There are other ways to supply context to an LLM. For example, fine-tuning enables you to retrain a subset of an LLM’s features on proprietary data. However, RAG has been proven to deliver better results than fine-tuning in most use cases and with far less engineering effort.
How RAG works
RAG consists of several components. Let’s look at each of them one by one.
Loading
Documents are parsed, chunked, and converted to embeddings in an Extract, Transform, and Load (ETL) process. These are then saved to a vector store.
Retrieval
The user asks a question, which the GenAI app embeds and uses to perform a similarity search over the vector store. This returns the top K semantically relevant document chunks.
Reranking
After vector search retrieves the top K results, a reranking step uses a cross-encoder or LLM to reorder them by contextual relevance. This improves accuracy but is compute-intensive, so it’s only applied post-search.
Augmentation
The app prepares the selected documents as part of a final prompt. This might involve refining the user’s question or organizing retrieved data using Agentic AI workflows to support the LLM’s generation better.
Generation
The LLM generates an answer based on this supplementary data, which the app then returns to the user.
Want to know more about RAG architecture? Check out this video.
Challenges with vector stores
The key challenge with vector stores is the data itself. Specifically, finding and accessing relevant, high-quality data across the organization to convert and store as vector embeddings.
Data access
The higher the quality of data that you feed to GenAI solutions, the better they work. However, most companies struggle to access and collaborate on data in a secure, well-governed manner. Critical data often lingers in data silos, making it difficult to find and convert into a usable format.
Data collaboration
Some companies have addressed this problem by undertaking extensive and costly data centralization initiatives. This involves consolidating all data into a single data lakehouse, where it can be explored, refined, and leveraged for use. However, such totalizing projects usually collapse under their own weight before yielding valid results.
Data governance
Without data governance, your projects cannot proceed. For this reason, AI data governance is fast emerging as one of the most important challenges facing Enterprise AI adoption. AI is meant to unlock speed to insight, and catapult your organization into a new era, but that transition has to be governed and accountable from the outset.
How to turn all your data into AI data
Adopting AI is no longer optional–it’s required. The best way to do this is to bring AI to your data with a Lakeside AI architecture.
With Lakeside AI, you can leverage your data architecture in the service of AI. Using federated access, you can explore your entire data ecosystem, bringing the lakehouse experience to your data wherever it lives.
The Starburst data platform integrates easily within your existing architecture to enable Lakeside AI. Built on the power of Trino and Apache Iceberg, Starburst lets you access, collaborate, and govern your data for any AI application. You can then leverage Starburst’s built-in AI functions to generate vector embeddings that point back to your Iceberg data.
AI is rapidly changing the way we build applications and run companies. With Lakeside AI, you can start building the enterprise of tomorrow today.