SAP BTP AI Best Practices #3: Content Filtering Intro

Post Content

For more information: https://sap.to/60534D3cz

Description
Data filtering during inference in Large Language Models (LLMs) is a critical process that ensures the quality and relevance of input data. It involves evaluating and selecting data to prevent low-quality or irrelevant inputs from affecting model performance. This process is essential for maintaining the accuracy and reliability of LLMs, especially when they are used for real-time applications or sensitive tasks.

Techniques such as tokenization, stemming, and lowercasing are used to preprocess data, removing unnecessary information and enhancing its quality. The choice of filter models depends on both dataset size and available computational resources. Smaller models can be effective for efficient filtering due to their ability to consistently evaluate instruction difficulty across different datasets. Using perplexity scores from smaller language models can help detect anomalies or outliers in outgoing data, aiding in preventing potential security breaches like weight exfiltration.

AI Core provides content filtering for hateful, violent, sexual and vulgar content through input and output filtering.

Expected Outcome
By implementing these content filters, you can ensure that the generative AI models in SAP AI Core produce safe and appropriate content, adhering to the specified content safety criteria thus ensuring compliance and alignment is achieved in generated text.

Benefits
Improved Relevance of Responses: Filtering content before or after querying helps the LLM focus on the most contextually appropriate and useful information, leading to more accurate and relevant answers

Reduced Harmful Content: Filtering helps put guard rails on models in order to prevent the spread of unethical, harmful, biased and hateful information Read More SAP Developers

#SAP