Some parts of the content below were enhanced and reviewed with the help of AI tools.
This blog post is part of the:
Using AI at business scale needs smarter control
AI capabilities are becoming more and more embedded in everyday business applications, from smart assistants to intelligent document workflows, enterprises are experiencing usage patterns that are both high-volume and highly variable. Fixed or one-size-fits-all rate limiting are not the best solution, businesses need dynamic, adaptive controls that can support real-time context and protect valuable AI resources without degrading the users experience.
What is dynamic rate limiting?
Dynamic rate limiting is known as a real-time method of controlling traffic or usage based on changing conditions. Unlike other methods that set fixed limits independent of the context, the dynamic rate limiting is supposed to be able to adjust based on conditions, for example:
User identification, type or subscriptionSystem load or response latencyTokens used, inference time (directly linked to AI service cost)Business SLAsOther relevant business factors
In AI systems computational and financial cost of running an AI model can vary widely, by having the possibility to do dynamic rate limiting becomes critical, also to protect the most critical business processes to not be disrupted by lacking of fast AI responses, with impact in company revenue and customers/users satisfaction/experience.
This is where a cache-service like Redis on SAP BTP can be highly effective, with its in-memory and high-speed data structures and real-time processing, provides a powerful foundation for dynamic rate limiting—one that aligns with business value, user behavior, and AI compute cost.
Redis on SAP BTP can give a complete control over rate limiting logic—is possible to implement algorithms like:
token bucket: can help tracking tokens using a key-value store (e.g., a counter) and timestampsleaky bucket: using a queue/list with timestamps to simulate water leaking at a fixed ratesliding window: with sorted sets to store request timestampsor rolling logs: lists to keep a log of request timestamps per user or API key enabling the check of the list length or timestamps to decide if a request is allowed.
Redis on SAP BTP can be also integrated directly into app code making it easier to tie rate limiting to business logic, like charging tokens, quota handling, or user session scoring.
Typical API management solutions usually offer limited, one-size-fits-all rate limiting. Dynamically adjust limits based on user tiers, past behavior, or even AI model feedback loops—can be something hard to do with rigid API gateway rules.
API gateways struggle to implement such responsive, context-aware mechanisms without significant complexity or external orchestration.
Redis on SAP BTP operates in-memory with sub-millisecond latency. This makes it ideal for high-performance, real-time applications (e.g., generative AI APIs, financial platforms, gaming, etc.), where even slight delays from external API management layers could hurt UX or the AI response time based on the conditions validation.
Architecture: cache-service as the dynamic control layer
Below is an example of simplified high-level architecture showing cache-service in a dynamic rate limiting setup:
The API gateway or middleware access Redis to decide whether a request should proceed, be throttled, or rerouted. Redis stores in-memory with high performance access the usage counters and dynamic thresholds based on time, plan, cost and other business related.
Below a sample Python code showing the implementation of rate-limiting mechanism using Redis, specifically employing a fixed window counter approach. This method tracks the number of requests a user makes within a defined time window and restricts access if the count exceeds a specified limit.
import redis
import time
r = redis.Redis(host=’localhost’, port=6379, db=0)
def can_access(user_id, token_cost, window_sec=60, max_tokens=100):
key = f”rate:{user_id}:{int(time.time()) // window_sec}”
current = r.get(key)
usage = int(current) if current else 0
if usage + token_cost > max_tokens:
return False
else:
pipe = r.pipeline()
pipe.incrby(key, token_cost)
pipe.expire(key, window_sec)
pipe.execute()
return True
# Example usage:
user_id = “premium_user_123”
if can_access(user_id, token_cost=12):
print(“Access granted: calling AI model”)
else:
print(“Rate limit exceeded”)
I hope you enjoyed this reading. If you liked it, you might also want to check out the other blogs in the same series.
Regards, Antonio
Some parts of the content below were enhanced and reviewed with the help of AI tools.This blog post is part of the:”Unleashing the Power of Redis on SAP BTP for Modern Applications:A Blog Series”Using AI at business scale needs smarter controlAI capabilities are becoming more and more embedded in everyday business applications, from smart assistants to intelligent document workflows, enterprises are experiencing usage patterns that are both high-volume and highly variable. Fixed or one-size-fits-all rate limiting are not the best solution, businesses need dynamic, adaptive controls that can support real-time context and protect valuable AI resources without degrading the users experience.What is dynamic rate limiting?Dynamic rate limiting is known as a real-time method of controlling traffic or usage based on changing conditions. Unlike other methods that set fixed limits independent of the context, the dynamic rate limiting is supposed to be able to adjust based on conditions, for example:User identification, type or subscriptionSystem load or response latencyTokens used, inference time (directly linked to AI service cost)Business SLAsOther relevant business factorsIn AI systems computational and financial cost of running an AI model can vary widely, by having the possibility to do dynamic rate limiting becomes critical, also to protect the most critical business processes to not be disrupted by lacking of fast AI responses, with impact in company revenue and customers/users satisfaction/experience.This is where a cache-service like Redis on SAP BTP can be highly effective, with its in-memory and high-speed data structures and real-time processing, provides a powerful foundation for dynamic rate limiting—one that aligns with business value, user behavior, and AI compute cost.Redis on SAP BTP can give a complete control over rate limiting logic—is possible to implement algorithms like:token bucket: can help tracking tokens using a key-value store (e.g., a counter) and timestampsleaky bucket: using a queue/list with timestamps to simulate water leaking at a fixed ratesliding window: with sorted sets to store request timestampsor rolling logs: lists to keep a log of request timestamps per user or API key enabling the check of the list length or timestamps to decide if a request is allowed.Redis on SAP BTP can be also integrated directly into app code making it easier to tie rate limiting to business logic, like charging tokens, quota handling, or user session scoring.Typical API management solutions usually offer limited, one-size-fits-all rate limiting. Dynamically adjust limits based on user tiers, past behavior, or even AI model feedback loops—can be something hard to do with rigid API gateway rules. API gateways struggle to implement such responsive, context-aware mechanisms without significant complexity or external orchestration.Redis on SAP BTP operates in-memory with sub-millisecond latency. This makes it ideal for high-performance, real-time applications (e.g., generative AI APIs, financial platforms, gaming, etc.), where even slight delays from external API management layers could hurt UX or the AI response time based on the conditions validation.Architecture: cache-service as the dynamic control layerBelow is an example of simplified high-level architecture showing cache-service in a dynamic rate limiting setup:The API gateway or middleware access Redis to decide whether a request should proceed, be throttled, or rerouted. Redis stores in-memory with high performance access the usage counters and dynamic thresholds based on time, plan, cost and other business related.Below a sample Python code showing the implementation of rate-limiting mechanism using Redis, specifically employing a fixed window counter approach. This method tracks the number of requests a user makes within a defined time window and restricts access if the count exceeds a specified limit.import redis
import time
r = redis.Redis(host=’localhost’, port=6379, db=0)
def can_access(user_id, token_cost, window_sec=60, max_tokens=100):
key = f”rate:{user_id}:{int(time.time()) // window_sec}”
current = r.get(key)
usage = int(current) if current else 0
if usage + token_cost > max_tokens:
return False
else:
pipe = r.pipeline()
pipe.incrby(key, token_cost)
pipe.expire(key, window_sec)
pipe.execute()
return True
# Example usage:
user_id = “premium_user_123”
if can_access(user_id, token_cost=12):
print(“Access granted: calling AI model”)
else:
print(“Rate limit exceeded”)I hope you enjoyed this reading. If you liked it, you might also want to check out the other blogs in the same series.Regards, Antonio Read More Technology Blogs by SAP articles
#SAP
#SAPTechnologyblog