🔵 Want better RAG results? Optimize your Data

Estimated read time 1 min read

Post Content

 

​ In the evolving landscape of AI, improving Retrieval-Augmented Generation (RAG) results is crucial. A key challenge in LLM training is the dwindling availability of high-quality, human-generated data. While more data is often seen as beneficial, in practice, irrelevant and noisy data can in fact negatively impact performance. Recent research highlights the advantages of Selective Language Modeling (SLM) in pretraining LLMs. In one study, selecting specific tokens during pretraining can substantially reduce downstream loss and enhance model performance. As part of the Enterprise AI Search team at SAP, we have access to extensive indexed data from internal and external sources such as SAP Help and Community documents to name a few. By leveraging these insights, we aim to refine token selection strategies, improving RAG efficiency and model effectiveness, and in short get the most out of our existing data.

Speaker
Ceylin Ozdemir   Read More SAP Developers 

#SAP

You May Also Like

More From Author