SAP BTP AI Best Practices #6: Vector-based RAG Query Pipeline

Post Content

For More Information: https://sap.to/60594DP2z
Description
Retrieval-Augmented Generation (RAG) is a powerful technique that combines the strengths of retrieval-based and Large Language generative models (LLMs) to enhance the quality and relevance of generated content. Implementing a Vector Based RAG is a two step approach-

Embedding Creation – Embeddings are the numerical vector representation of the knowledge segments. In this step, you convert knowledge base to vectors and store them in vector database. To have a better understanding of this topic and related best practices, it is recommended to go through Vector Embedding(1/1) part in this series.
Query Pipeline – In this step we leverage the vector representation of the knowledge base and augment the LLM context. To achieve this, first we convert the question into vector embedding and then perform similarity search in Vector database and fetch the best matches. Post that we leverage these best matches as augmented context along with prompt to LLM to generate a response for the user’s question.
In this best practice we will focus on querying aspect when using RAG technique.

xample Process Flow for a Vector Engine-powered RAG approach

In the context of SAP’s Generative AI Hub, it serves as a central platform for orchestrating AI processes, allowing developers to leverage pre-trained models and integrate external, domain-specific data to improve AI outputs. This approach is particularly beneficial in business scenarios where contextual relevance and data privacy are paramount. By utilizing SAP HANA Cloud’s Vector Engine and the Generative AI Hub SDK, developers can create sophisticated RAG applications that not only generate high-quality content but also maintain data privacy through techniques like data masking.

RAG technique is applicable and beneficial in a variety of GenAI scenarios. Some of the use cases are as follows:

Question Answering Systems: RAG can power Q&A systems that deliver accurate and up-to-date responses, especially in domains with frequently changing information (e.g., finance, healthcare)
Chatbots and Virtual Assistants: RAG-powered chatbots can provide more helpful and informative responses, drawing on relevant information from knowledge bases
Code Completion and Suggestion Systems: RAG can enhance code suggestion systems by incorporating context-specific knowledge from code repositories or documentation.
Expected Outcome
RAG technique helps to generate content to increase the quality while keeping it relevant to the business context (also called grounding in LLM space). Some of the key outcomes of the RAG technique are:

More Accurate and Relevant Answers: Because the system retrieves context that aligns with the meaning of the query, the LLM is more likely to produce useful, correct, and directly related responses
Better User Experience: Users can phrase queries naturally (like how they’d talk to a human) and still get meaningful answers. No need to “speak the system’s language.” This lowers the learning curve and boosts adoption.
Improved Trust & Reliability: When answers are grounded in semantically relevant source material, it’s easier to show where the answer came from (source documents, paragraphs, etc.). This increases transparency and user trust, especially in enterprise or high-stakes settings.
Benefits
Contextual Relevance: Response contain passages that match the intent and meaning, not just the literal words.
Dynamic Adaptability: You can ask open-ended or unstructured questions and still get good results.
Reduced Hallucinations: High-quality, semantically matched context improves the factual accuracy of the final answer.
Key Concepts
Vector Similarity Search: The process of finding the most relevant pieces of content based on closeness in vector space (using similarity metrics like cosine similarity).
Data Masking: A technique used to protect sensitive information by anonymizing or pseudonymizing data inputs, ensuring privacy and compliance. Data masking feature is available in Generative AI Hub orchestration layer, which helps you protect your sensitive information.
Templating: Templating is feature of the orchestration layer in Generative AI Hub that allow for the creation of structured prompts and the integration of domain-specific data to enhance AI model outputs.
Context Window: The amount of retrieved content the LLM can “see” when generating a response.
Hybrid Search: Combining vector search with keyword search for even better accuracy. Read More SAP Developers

#SAP