Introduction
Welcome back to our series “SAP AI Core is All You Need“!
In this blog, we’ll keep moving towards the serving components to make anyone to consume our models. This time, we’ll cover the serving template, docker builder and deployment. It is very exciting to be so close to see our model really up and running, isn’t?
What to Expect
In this blog, you will gain practical insights into the following:
KServe Serving Template: Discover how to design and implement a Serving Template in KServe to deploy machine learning models on Kubernetes.Deploying with SAP AI Core: Follow a detailed guide on deploying your model using SAP AI Core and AI Launchpad, ensuring seamless and scalable inference.
By the end of this blog, you’ll have a comprehensive understanding of how to deploy and serve advanced AI models, leveraging SAP AI Core and KServe to make the Shakespeare Language Model available for everyone.
Designing and Understanding the Serving Template
KServe is a platform for serving machine learning models on Kubernetes, providing scalable and flexible infrastructure for inference tasks. The Serving Template API defines a blueprint for deploying and managing models as microservices within the KServe framework. It specifies metadata, input requirements, and configurations for deploying the model-serving container.
Breakdown of the ServingTemplate Manifest
Now, let’s explain each section of your specific ServingTemplate manifest (the yaml file):
apiVersion: ai.sap.com/v1alpha1
kind: ServingTemplate
metadata:
name: shakespeare-model-generator-api
annotations:
scenarios.ai.sap.com/name: “shakespeare-language-model”
scenarios.ai.sap.com/description: “Shakespeare Language Model”
executables.ai.sap.com/name: “Shakespeare-language-model-generator”
executables.ai.sap.com/description: “Shakespeare Language Model Text Generator”
artifacts.ai.sap.com/logs.kind: “other”
artifacts.ai.sap.com/logs.description: “Model Training Logs”
artifacts.ai.sap.com/logs.labels: |
{“ext.ai.sap.com/step”:”train”, “ext.ai.sap.com/version”:”0.0.1″}
labels:
scenarios.ai.sap.com/id: “shakespeare-language-model”
executables.ai.sap.com/id: “shakespeare-generator”
ai.sap.com/version: “0.0.1”
Metadata
The metadata section of the ServingTemplate provides descriptive information about the template and its associated components:
Name: The name of the serving template is shakespeare-model-generator-api. This name is used to uniquely identify the template within the KServe environment.
Annotations: Additional metadata annotations provide context and description for the serving template:scenarios.ai.sap.com/name: Describes the scenario related to the model as “shakespeare-language-model”. This annotation specifies the purpose or context of the model being served.scenarios.ai.sap.com/description: Provides a detailed description of the scenario as “Shakespeare Language Model”. This annotation explains the nature of the model and its use case.executables.ai.sap.com/name: Specifies the name of the executable associated with the serving template as “Shakespeare-language-model-generator”. This annotation identifies the executable component responsible for generating text using the model.executables.ai.sap.com/description: Describes the executable in detail as “Shakespeare Language Model Text Generator”. This annotation clarifies the role and functionality of the executable within the serving context.artifacts.ai.sap.com/logs.kind: Defines the type of model training logs generated as “other”. This annotation categorizes the logs produced during model training.artifacts.ai.sap.com/logs.description: Describes the model training logs as “Model Training Logs”. This annotation provides a brief explanation of the purpose and content of the training logs.artifacts.ai.sap.com/logs.labels: Specifies labels for the model training logs with JSON format ({“ext.ai.sap.com/step”:”train”, “ext.ai.sap.com/version”:”0.0.1″}). These labels provide additional context or metadata for the logs.
Labels: Labels associated with the serving template provide semantic keywords or identifiers for better categorization and understanding within SAP AI Core:scenarios.ai.sap.com/id: Identifies the scenario ID as “shakespeare-language-model”. This label helps categorize and organize templates based on scenario types.executables.ai.sap.com/id: Specifies the executable ID as “shakespeare-generator”. This label categorizes templates based on associated executables.ai.sap.com/version: Indicates the version of the serving template as “0.0.1”. This label manages template versions and compatibility.
Spec
spec:
inputs:
artifacts:
– name: model
template:
apiVersion: “serving.kserve.io/v1beta1”
metadata:
annotations: |
autoscaling.knative.dev/metric: concurrency
autoscaling.knative.dev/target: 1
autoscaling.knative.dev/targetBurstCapacity: 0
labels: |
ai.sap.com/resourcePlan: infer.m
The spec section of the ServingTemplate defines the configuration for deploying and serving the model.
Artifacts: Specifies the input artifacts required for serving the model.model: Defines an artifact named model that will be used as input for the serving process. This artifact likely represents the trained machine learning model that will be loaded and utilized during inference.apiVersion: Specifies the version of the serving template’s resource (serving.kserve.io/v1beta1), indicating compatibility and feature support.
Metadata:Annotations:autoscaling.knative.dev/metric: Defines the autoscaling metric as concurrency, indicating that the number of replicas will be adjusted based on the concurrency of incoming requests.autoscaling.knative.dev/target: Sets the target number of replicas to 1. This specifies the desired number of instances to maintain for serving.autoscaling.knative.dev/targetBurstCapacity: Specifies the burst capacity for scaling as 0, which means there is no additional capacity allocated beyond the target.Labels:ai.sap.com/resourcePlan: Assigns a label to specify the resource plan as infer.m. This label helps categorize and manage resource allocation for the serving template, indicating the intended resource usage pattern (inferencing medium).
The spec section of the ServingTemplate includes configurations for the serving process, particularly focusing on the predictor component responsible for deploying and managing the model-serving container.
Predictor
spec: |
predictor:
imagePullSecrets:
– name: shakespeare-docker-repo
minReplicas: 1
maxReplicas: 5
containers:
– name: kserve-container
image: docker.io/carlosbasto/shakespeare-server-generator:0.0.1
ports:
– containerPort: 9001
protocol: TCP
command: [“/bin/sh”, “-c”]
args:
– “python /app/src/main.py”
env:
– name: STORAGE_URI # Required
value: “{{inputs.artifacts.model}}”
– name: BUCKET_NAME
valueFrom:
secretKeyRef:
name: object-store-credentials
key: bucket
– name: PREFIX_NAME
valueFrom:
secretKeyRef:
name: object-store-credentials
key: path_prefix
– name: ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: object-store-credentials
key: access_key_id
– name: SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: object-store-credentials
key: secret_access_key
Predictor:imagePullSecrets:name: shakespeare-docker-repo: Specifies the name of the Docker image pull secret used to authenticate and access the container image repository (shakespeare-docker-repo).minReplicas: Sets the minimum number of replicas for the predictor to 1, ensuring that at least one instance is always available for serving.maxReplicas: Sets the maximum number of replicas for the predictor to 5, defining the upper limit of scaling based on demand.containers:name: Specifies the name of the container as kserve-container.image: Specifies the Docker image URL (docker.io/carlosbasto/shakespeare-server-generator:0.0.1) used for the container, which hosts the model serving application.ports:containerPort: 9001: Specifies the port (9001) exposed by the container for incoming requests.protocol: TCP: Defines the protocol (TCP) used for communication over the specified port.command: Specifies the command to run in the container ([“/bin/sh”, “-c”]), which is the entry point for executing the container.args:[“python /app/src/main.py”]: Specifies the arguments passed to the command, indicating the Python script (/app/src/main.py) to execute within the container.env: Defines environment variables required by the container for configuration and runtime settings:STORAGE_URI: Specifies the storage URI ({{inputs.artifacts.model}}) for accessing the trained model artifact. This is mandatory, you must set it so SAP AI Core will copy all the files for that input from S3 to your container -> /mnt/models/* .BUCKET_NAME, PREFIX_NAME, ACCESS_KEY_ID, SECRET_ACCESS_KEY: Environment variables sourced from secrets (object-store-credentials) for accessing object storage, including bucket name, path prefix, access key ID, and secret access key.
Generating the Dockerfile
When deploying machine learning models using SAP AI Core, the Dockerfile plays an important role in defining the environment in which your application will run. The Dockerfile is a script that contains a series of instructions on how to build a Docker image. This image packages everything your model needs, including the operating system, dependencies, and your application code, ensuring consistency across different environments. Here’s an explanation of what the following Dockerfile does:
# Use the PyTorch image with CUDA 12.1 and cuDNN 8 runtime
FROM pytorch/pytorch:2.2.2-cuda12.1-cudnn8-runtime
# Set the locale to avoid issues with encoding
ENV LANG C.UTF-8
# Install necessary system dependencies
RUN apt-get update && apt-get install -y
python3-pip
&& apt-get clean
&& rm -rf /var/lib/apt/lists/*
# Create the application directory within the Docker image
RUN mkdir -p /app/src
# Copy application files from local system to the Docker image
COPY main.py /app/src/
COPY requirements.txt /app/src/
COPY ShakespeareanGenerator/*.py /app/src/ShakespeareanGenerator/
COPY ShakespeareanGenerator/model/*.py /app/src/ShakespeareanGenerator/model/
# Install Python dependencies within the Docker image
RUN pip3 install –no-cache-dir -r /app/src/requirements.txt
# Set appropriate permissions for the application directory
RUN chgrp -R nogroup /app &&
chmod -R 770 /app
The Dockerfile begins by specifying a base image, pytorch/pytorch:2.2.2-cuda12.1-cudnn8-runtime, which includes PyTorch with CUDA 12.1 and cuDNN 8 runtime support. This is essential for leveraging GPU acceleration during model inference. The locale is then set to C.UTF-8 to avoid encoding issues that can arise during runtime.
Next, the Dockerfile updates the package list and installs necessary system dependencies, specifically python3-pip, which is needed to install Python packages. To keep the image clean and reduce its size, it removes the package lists after the installation.
A directory /app/src is created within the Docker image to house the application files. The subsequent COPY commands transfer the main.py script, the requirements.txt file, and all necessary Python files from the local system to the /app/src/ directory in the Docker image. This includes the main application logic and the Shakespearean text generation module.
After copying the files, the Dockerfile installs the required Python dependencies listed in requirements.txt using pip3. The –no-cache-dir option ensures that the package manager does not cache the downloaded files, which helps in keeping the image size smaller.
Finally, the Dockerfile sets the appropriate permissions for the /app directory. It changes the group ownership to nogroup and sets the permissions to allow read, write, and execute access for the owner and the group, ensuring that the application can run smoothly in different environments.
And this completes the final step in our setup for deploying language models using SAP AI Core. Now, let’s proceed with the deployment – exciting times ahead!
Deploying with SAP AI Core and AI Launchpad
Alright, now that we’ve got everything set up locally and our model is ready to shine, let’s take the next step and deploy it using SAP AI Core and AI Launchpad. Make sure you’ve checked off these essential steps before diving in:
Serving Template and Image Registry: Ensure that your Serving Template is available in your GitHub repository. This template is mandatory for deploying your model with SAP AI Core. Additionally, confirm that your Docker image containing the model is pushed to your image registry. These steps are essential for deployment to work.Syncing Repository with SAP AI Core: Head over to the SAP AI Launchpad. Start by selecting your resource group and navigate to the applications section. Here, you’ll sync your GitHub repository with SAP AI Core. This step allows you to access and deploy your model directly from the synced repository.
Time to Deploy ?
Once your repository is synced and ready to roll, you’re all set to deploy your model using SAP AI Core and leverage the power of AI Launchpad for scalable and productive inference.
Assuming everything went smoothly during the deployment process, you should see an outcome similar to the following scenario:
Now, the critical artifact we require for execution is the “INPUT MODEL,” correct? Since we’ve already established the input within the serving template and configured the environment variable STORAGE_URI, our primary task is to ensure that the corresponding artifact exists within SAP AI Core.
There are multiple methods to achieve this. One straightforward approach is to automatically retrieve files from S3 (similar to how we handled the dataset during training and fine-tuning) and copy them to the designated folder for deployment. Alternatively, we can manually upload the file by copying and pasting it. For simplicity, let’s upload the file to the S3 folder ai://shakespeare/deployments, which translates to s3://hcp-f4249aeb-db74-47b2-b5f0-41a00f48224b/shakespeare/repository/deployments/ in my case.
Magically, the models will appear in the designated location ?. However, you might wonder why we don’t simply assign the trained model output (artifact) to our Configuration and call it done?
Well, assigning it that way would imply that SAP AI Core copies everything from that folder, right? That’s correct, but remember how we structured our development process. We saved the models (language model and tokenizer) in different folders (model and bpe_model respectively). Consequently, we’ll need two different paths pointing to the same environment variable STORAGE_URI and write this to the serving template. However, based on the KServe documentation, only a single URI for the model artifact is supported, as indicated here.
No need to dwell on this too much. Let’s proceed and create the artifact:
Now that everything required is in place, let’s create a Configuration to deploy our model API:
Select Input Artifacts, choose the recently created source models (e.g., source_models).
Then click on “Create Deployment” to initiate the deployment process.
Choose the duration for which your model will be served.
Just out of curiosity, if you see a log like this:
Just out of curiosity, a “revision” refers to a unique version or iteration of your Deployment configuration. When updates are made to the Deployment (e.g., modifying container images or environment variables), Kubernetes generates a new revision to manage a new set of Pods based on the updated configuration. Anyway, coming back to the deployment…
Upon completion, you can expect to encounter Kubernetes-specific statuses such as “Target Status” (RUNNING), “Current Status” (UNKNOWN), PENDING, or DEAD. This is due to the declarative model in action, where Kubernetes ensures that the desired state (RUNNING) is achieved and maintained.
Using Shakespeare Language Model API for Text Generation
We’ve made it through! Thank you, and good luck with using your API. Eager to see it in action? Great! Let’s open a Python environment (or Postman, etc.) and start using the API. Let’s keep it simple to get started ?.
First let’s create a file.json that is going to hold our credentials:
{
“AICORE_AUTH_URL”: “”,
“AICORE_CLIENT_ID”: “”,
“AICORE_CLIENT_SECRET”: “”,
“AICORE_BASE_URL”: “”,
“AICORE_RESOURCE_GROUP”: “language-models”
}
Something like this, ok? You can take all these values from the SAP AI Core key on BTP Cockpit. Alright, let’s break down and explain each part of this code step by step in a friendly and informal manner:
Loading Environment Configuration
import os
import json
# Load the configuration from the JSON file
with open(‘env.json’) as f:
config = json.load(f)
# Set each environment variable using the %env magic command
for key, value in config.items():
os.environ[key] = value
We’re starting by loading some configuration settings from a JSON file named env.json. This file likely contains various settings or credentials needed for our application. We’re using Python’s built-in json library to read this file and load its contents into a dictionary called config. Then, we iterate over each key-value pair in config and set them as environment variables using os.environ[key] = value. This allows us to access these values as environment variables later in the script.
Retrieving Authentication Credentials
uua_url = os.environ[“AICORE_AUTH_URL”]
clientid = os.environ[“AICORE_CLIENT_ID”]
clientsecret = os.environ[“AICORE_CLIENT_SECRET”]
Here, we’re retrieving specific values from the environment variables that we just loaded from env.json. We’re assigning these values to variables like uua_url, clientid, and clientsecret, which we’ll use later to authenticate our requests.
Authenticating and Getting Access Token
import requests
params = {“grant_type”: “client_credentials” }
resp = requests.post(f”{uua_url}/oauth/token”,
auth=(clientid, clientsecret),
params=params)
token = resp.json()[“access_token”]
We’re using the requests library to make an HTTP POST request to a specified URL (uua_url) to obtain an access token for authentication. We’re sending along our client ID (clientid) and client secret (clientsecret) as basic authentication credentials (auth=(clientid, clientsecret)). The response (resp) contains JSON data, and we extract the access_token from it, which we’ll use in subsequent requests.
Making a Model Inference Request
deployment_url = ‘https://<server>/v2/inference/deployments/<deployment_id>’
inference_url = deployment_url + ‘/v2/generate’
# Set resource group and request headers
RESOURCE_GROUP = ‘language-models’
headers = {
‘Content-Type’: ‘application/json’,
‘AI-Resource-Group’: RESOURCE_GROUP,
‘Authorization’: f’Bearer {token}’
}
# Define parameters for model inference
max_tokens = 200
temperature = 0.5
top_k = 0
top_p = 0.9
# Create payload for model inference
payload = {
‘max_tokens’: max_tokens,
‘temperature’: temperature,
‘top_k’: top_k,
‘top_p’: top_p
}
# Make POST request to model inference endpoint
response = requests.post(inference_url, headers=headers, json=payload)
Here, we’re setting up the URL (inference_url) for making a model inference request to a specific deployment (deployment_url). We then define some request headers (headers), including the access token obtained earlier (token). We also specify the parameters (max_tokens, temperature, top_k, top_p) for our model inference, which we encapsulate in a payload dictionary. Finally, we use requests.get to send a GET request to the inference_url with the specified headers and payload.
Processing the Model Inference Response
# Process response
if response.status_code == 200:
generated_text = response.json().get(‘generated_text’)
model_details = response.json().get(‘model_details’)
print(“Generated Text:”)
for line in generated_text:
print(line)
print(“nModel Details:”)
print(model_details)
else:
print(f”Error. Status code: {response.status_code}”)
print(response.text)
This last section checks the response from our model inference request. If the status code (response.status_code) is 200 (which indicates a successful response), we extract and print the generated text (generated_text) and model details (model_details) from the JSON response. Otherwise, if the status code is not 200, we print an error message along with the status code and the response text for debugging purposes.
And here’s the result:
Of course, this is just a test to show you how to consume the API we just deployed. In the next blog post, when we compare the Shakespeare language model and the fine-tuned one, we’ll use Streamlit, which will allow us to make things even nicer. Anyway, I think we have enough for now!
See you in the next blog post ?.
Wrapping Up and Next Steps
Congratulations on taking the first step into deploying AI models with SAP AI Core! In this blog, we explored how to deploy the Shakespeare Language Model using SAP AI Core and KServe.
Let’s recap what we’ve covered:
KServe Serving Template: We discovered how to design and implement a Serving Template in KServe to deploy machine learning models on Kubernetes.
Deploying with SAP AI Core: We followed a detailed guide on deploying your model using SAP AI Core and AI Launchpad, ensuring seamless and scalable inference.
Next Steps
Now that we’ve deployed the Shakespearean Language Model, stay tuned for the final blog in this series, where we’ll deploy the Text Style Transfer fine tuned model and evaluate both performances.
Sampling and Consuming Language Models: Discover methods for sampling from trained language models and integrating them into applications and developing a Language-Model-Based App to gain insights into building an application powered by your trained language model.
[SAP AI Core is All You Need | 8. Consuming and Sampling from Shakespeare Language Models]
Further References
Source Code: GitHub repositorySAP AI Core HelpSAP AI LaunchpadKubernetesKserve
IntroductionWelcome back to our series “SAP AI Core is All You Need”!In this blog, we’ll keep moving towards the serving components to make anyone to consume our models. This time, we’ll cover the serving template, docker builder and deployment. It is very exciting to be so close to see our model really up and running, isn’t?What to ExpectIn this blog, you will gain practical insights into the following:KServe Serving Template: Discover how to design and implement a Serving Template in KServe to deploy machine learning models on Kubernetes.Deploying with SAP AI Core: Follow a detailed guide on deploying your model using SAP AI Core and AI Launchpad, ensuring seamless and scalable inference.By the end of this blog, you’ll have a comprehensive understanding of how to deploy and serve advanced AI models, leveraging SAP AI Core and KServe to make the Shakespeare Language Model available for everyone.Designing and Understanding the Serving TemplateKServe is a platform for serving machine learning models on Kubernetes, providing scalable and flexible infrastructure for inference tasks. The Serving Template API defines a blueprint for deploying and managing models as microservices within the KServe framework. It specifies metadata, input requirements, and configurations for deploying the model-serving container.Breakdown of the ServingTemplate ManifestNow, let’s explain each section of your specific ServingTemplate manifest (the yaml file): apiVersion: ai.sap.com/v1alpha1
kind: ServingTemplate
metadata:
name: shakespeare-model-generator-api
annotations:
scenarios.ai.sap.com/name: “shakespeare-language-model”
scenarios.ai.sap.com/description: “Shakespeare Language Model”
executables.ai.sap.com/name: “Shakespeare-language-model-generator”
executables.ai.sap.com/description: “Shakespeare Language Model Text Generator”
artifacts.ai.sap.com/logs.kind: “other”
artifacts.ai.sap.com/logs.description: “Model Training Logs”
artifacts.ai.sap.com/logs.labels: |
{“ext.ai.sap.com/step”:”train”, “ext.ai.sap.com/version”:”0.0.1″}
labels:
scenarios.ai.sap.com/id: “shakespeare-language-model”
executables.ai.sap.com/id: “shakespeare-generator”
ai.sap.com/version: “0.0.1” MetadataThe metadata section of the ServingTemplate provides descriptive information about the template and its associated components:Name: The name of the serving template is shakespeare-model-generator-api. This name is used to uniquely identify the template within the KServe environment.Annotations: Additional metadata annotations provide context and description for the serving template:scenarios.ai.sap.com/name: Describes the scenario related to the model as “shakespeare-language-model”. This annotation specifies the purpose or context of the model being served.scenarios.ai.sap.com/description: Provides a detailed description of the scenario as “Shakespeare Language Model”. This annotation explains the nature of the model and its use case.executables.ai.sap.com/name: Specifies the name of the executable associated with the serving template as “Shakespeare-language-model-generator”. This annotation identifies the executable component responsible for generating text using the model.executables.ai.sap.com/description: Describes the executable in detail as “Shakespeare Language Model Text Generator”. This annotation clarifies the role and functionality of the executable within the serving context.artifacts.ai.sap.com/logs.kind: Defines the type of model training logs generated as “other”. This annotation categorizes the logs produced during model training.artifacts.ai.sap.com/logs.description: Describes the model training logs as “Model Training Logs”. This annotation provides a brief explanation of the purpose and content of the training logs.artifacts.ai.sap.com/logs.labels: Specifies labels for the model training logs with JSON format ({“ext.ai.sap.com/step”:”train”, “ext.ai.sap.com/version”:”0.0.1″}). These labels provide additional context or metadata for the logs.Labels: Labels associated with the serving template provide semantic keywords or identifiers for better categorization and understanding within SAP AI Core:scenarios.ai.sap.com/id: Identifies the scenario ID as “shakespeare-language-model”. This label helps categorize and organize templates based on scenario types.executables.ai.sap.com/id: Specifies the executable ID as “shakespeare-generator”. This label categorizes templates based on associated executables.ai.sap.com/version: Indicates the version of the serving template as “0.0.1”. This label manages template versions and compatibility.Spec spec:
inputs:
artifacts:
– name: model
template:
apiVersion: “serving.kserve.io/v1beta1”
metadata:
annotations: |
autoscaling.knative.dev/metric: concurrency
autoscaling.knative.dev/target: 1
autoscaling.knative.dev/targetBurstCapacity: 0
labels: |
ai.sap.com/resourcePlan: infer.m The spec section of the ServingTemplate defines the configuration for deploying and serving the model.Artifacts: Specifies the input artifacts required for serving the model.model: Defines an artifact named model that will be used as input for the serving process. This artifact likely represents the trained machine learning model that will be loaded and utilized during inference.apiVersion: Specifies the version of the serving template’s resource (serving.kserve.io/v1beta1), indicating compatibility and feature support.Metadata:Annotations:autoscaling.knative.dev/metric: Defines the autoscaling metric as concurrency, indicating that the number of replicas will be adjusted based on the concurrency of incoming requests.autoscaling.knative.dev/target: Sets the target number of replicas to 1. This specifies the desired number of instances to maintain for serving.autoscaling.knative.dev/targetBurstCapacity: Specifies the burst capacity for scaling as 0, which means there is no additional capacity allocated beyond the target.Labels:ai.sap.com/resourcePlan: Assigns a label to specify the resource plan as infer.m. This label helps categorize and manage resource allocation for the serving template, indicating the intended resource usage pattern (inferencing medium).The spec section of the ServingTemplate includes configurations for the serving process, particularly focusing on the predictor component responsible for deploying and managing the model-serving container.Predictor spec: |
predictor:
imagePullSecrets:
– name: shakespeare-docker-repo
minReplicas: 1
maxReplicas: 5
containers:
– name: kserve-container
image: docker.io/carlosbasto/shakespeare-server-generator:0.0.1
ports:
– containerPort: 9001
protocol: TCP
command: [“/bin/sh”, “-c”]
args:
– “python /app/src/main.py”
env:
– name: STORAGE_URI # Required
value: “{{inputs.artifacts.model}}”
– name: BUCKET_NAME
valueFrom:
secretKeyRef:
name: object-store-credentials
key: bucket
– name: PREFIX_NAME
valueFrom:
secretKeyRef:
name: object-store-credentials
key: path_prefix
– name: ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: object-store-credentials
key: access_key_id
– name: SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: object-store-credentials
key: secret_access_key Predictor:imagePullSecrets:name: shakespeare-docker-repo: Specifies the name of the Docker image pull secret used to authenticate and access the container image repository (shakespeare-docker-repo).minReplicas: Sets the minimum number of replicas for the predictor to 1, ensuring that at least one instance is always available for serving.maxReplicas: Sets the maximum number of replicas for the predictor to 5, defining the upper limit of scaling based on demand.containers:name: Specifies the name of the container as kserve-container.image: Specifies the Docker image URL (docker.io/carlosbasto/shakespeare-server-generator:0.0.1) used for the container, which hosts the model serving application.ports:containerPort: 9001: Specifies the port (9001) exposed by the container for incoming requests.protocol: TCP: Defines the protocol (TCP) used for communication over the specified port.command: Specifies the command to run in the container ([“/bin/sh”, “-c”]), which is the entry point for executing the container.args:[“python /app/src/main.py”]: Specifies the arguments passed to the command, indicating the Python script (/app/src/main.py) to execute within the container.env: Defines environment variables required by the container for configuration and runtime settings:STORAGE_URI: Specifies the storage URI ({{inputs.artifacts.model}}) for accessing the trained model artifact. This is mandatory, you must set it so SAP AI Core will copy all the files for that input from S3 to your container -> /mnt/models/* .BUCKET_NAME, PREFIX_NAME, ACCESS_KEY_ID, SECRET_ACCESS_KEY: Environment variables sourced from secrets (object-store-credentials) for accessing object storage, including bucket name, path prefix, access key ID, and secret access key.Generating the DockerfileWhen deploying machine learning models using SAP AI Core, the Dockerfile plays an important role in defining the environment in which your application will run. The Dockerfile is a script that contains a series of instructions on how to build a Docker image. This image packages everything your model needs, including the operating system, dependencies, and your application code, ensuring consistency across different environments. Here’s an explanation of what the following Dockerfile does: # Use the PyTorch image with CUDA 12.1 and cuDNN 8 runtime
FROM pytorch/pytorch:2.2.2-cuda12.1-cudnn8-runtime
# Set the locale to avoid issues with encoding
ENV LANG C.UTF-8
# Install necessary system dependencies
RUN apt-get update && apt-get install -y
python3-pip
&& apt-get clean
&& rm -rf /var/lib/apt/lists/*
# Create the application directory within the Docker image
RUN mkdir -p /app/src
# Copy application files from local system to the Docker image
COPY main.py /app/src/
COPY requirements.txt /app/src/
COPY ShakespeareanGenerator/*.py /app/src/ShakespeareanGenerator/
COPY ShakespeareanGenerator/model/*.py /app/src/ShakespeareanGenerator/model/
# Install Python dependencies within the Docker image
RUN pip3 install –no-cache-dir -r /app/src/requirements.txt
# Set appropriate permissions for the application directory
RUN chgrp -R nogroup /app &&
chmod -R 770 /app The Dockerfile begins by specifying a base image, pytorch/pytorch:2.2.2-cuda12.1-cudnn8-runtime, which includes PyTorch with CUDA 12.1 and cuDNN 8 runtime support. This is essential for leveraging GPU acceleration during model inference. The locale is then set to C.UTF-8 to avoid encoding issues that can arise during runtime.Next, the Dockerfile updates the package list and installs necessary system dependencies, specifically python3-pip, which is needed to install Python packages. To keep the image clean and reduce its size, it removes the package lists after the installation.A directory /app/src is created within the Docker image to house the application files. The subsequent COPY commands transfer the main.py script, the requirements.txt file, and all necessary Python files from the local system to the /app/src/ directory in the Docker image. This includes the main application logic and the Shakespearean text generation module.After copying the files, the Dockerfile installs the required Python dependencies listed in requirements.txt using pip3. The –no-cache-dir option ensures that the package manager does not cache the downloaded files, which helps in keeping the image size smaller.Finally, the Dockerfile sets the appropriate permissions for the /app directory. It changes the group ownership to nogroup and sets the permissions to allow read, write, and execute access for the owner and the group, ensuring that the application can run smoothly in different environments.And this completes the final step in our setup for deploying language models using SAP AI Core. Now, let’s proceed with the deployment – exciting times ahead!Deploying with SAP AI Core and AI LaunchpadAlright, now that we’ve got everything set up locally and our model is ready to shine, let’s take the next step and deploy it using SAP AI Core and AI Launchpad. Make sure you’ve checked off these essential steps before diving in:Serving Template and Image Registry: Ensure that your Serving Template is available in your GitHub repository. This template is mandatory for deploying your model with SAP AI Core. Additionally, confirm that your Docker image containing the model is pushed to your image registry. These steps are essential for deployment to work.Syncing Repository with SAP AI Core: Head over to the SAP AI Launchpad. Start by selecting your resource group and navigate to the applications section. Here, you’ll sync your GitHub repository with SAP AI Core. This step allows you to access and deploy your model directly from the synced repository.Time to Deploy ?Once your repository is synced and ready to roll, you’re all set to deploy your model using SAP AI Core and leverage the power of AI Launchpad for scalable and productive inference.Assuming everything went smoothly during the deployment process, you should see an outcome similar to the following scenario: Now, the critical artifact we require for execution is the “INPUT MODEL,” correct? Since we’ve already established the input within the serving template and configured the environment variable STORAGE_URI, our primary task is to ensure that the corresponding artifact exists within SAP AI Core.There are multiple methods to achieve this. One straightforward approach is to automatically retrieve files from S3 (similar to how we handled the dataset during training and fine-tuning) and copy them to the designated folder for deployment. Alternatively, we can manually upload the file by copying and pasting it. For simplicity, let’s upload the file to the S3 folder ai://shakespeare/deployments, which translates to s3://hcp-f4249aeb-db74-47b2-b5f0-41a00f48224b/shakespeare/repository/deployments/ in my case.Magically, the models will appear in the designated location ?. However, you might wonder why we don’t simply assign the trained model output (artifact) to our Configuration and call it done?Well, assigning it that way would imply that SAP AI Core copies everything from that folder, right? That’s correct, but remember how we structured our development process. We saved the models (language model and tokenizer) in different folders (model and bpe_model respectively). Consequently, we’ll need two different paths pointing to the same environment variable STORAGE_URI and write this to the serving template. However, based on the KServe documentation, only a single URI for the model artifact is supported, as indicated here.No need to dwell on this too much. Let’s proceed and create the artifact:Now that everything required is in place, let’s create a Configuration to deploy our model API: Select Input Artifacts, choose the recently created source models (e.g., source_models). Then click on “Create Deployment” to initiate the deployment process. Choose the duration for which your model will be served. Just out of curiosity, if you see a log like this:Just out of curiosity, a “revision” refers to a unique version or iteration of your Deployment configuration. When updates are made to the Deployment (e.g., modifying container images or environment variables), Kubernetes generates a new revision to manage a new set of Pods based on the updated configuration. Anyway, coming back to the deployment… Upon completion, you can expect to encounter Kubernetes-specific statuses such as “Target Status” (RUNNING), “Current Status” (UNKNOWN), PENDING, or DEAD. This is due to the declarative model in action, where Kubernetes ensures that the desired state (RUNNING) is achieved and maintained. Using Shakespeare Language Model API for Text GenerationWe’ve made it through! Thank you, and good luck with using your API. Eager to see it in action? Great! Let’s open a Python environment (or Postman, etc.) and start using the API. Let’s keep it simple to get started ?.First let’s create a file.json that is going to hold our credentials: {
“AICORE_AUTH_URL”: “”,
“AICORE_CLIENT_ID”: “”,
“AICORE_CLIENT_SECRET”: “”,
“AICORE_BASE_URL”: “”,
“AICORE_RESOURCE_GROUP”: “language-models”
} Something like this, ok? You can take all these values from the SAP AI Core key on BTP Cockpit. Alright, let’s break down and explain each part of this code step by step in a friendly and informal manner:Loading Environment Configuration import os
import json
# Load the configuration from the JSON file
with open(‘env.json’) as f:
config = json.load(f)
# Set each environment variable using the %env magic command
for key, value in config.items():
os.environ[key] = value We’re starting by loading some configuration settings from a JSON file named env.json. This file likely contains various settings or credentials needed for our application. We’re using Python’s built-in json library to read this file and load its contents into a dictionary called config. Then, we iterate over each key-value pair in config and set them as environment variables using os.environ[key] = value. This allows us to access these values as environment variables later in the script.Retrieving Authentication Credentials uua_url = os.environ[“AICORE_AUTH_URL”]
clientid = os.environ[“AICORE_CLIENT_ID”]
clientsecret = os.environ[“AICORE_CLIENT_SECRET”] Here, we’re retrieving specific values from the environment variables that we just loaded from env.json. We’re assigning these values to variables like uua_url, clientid, and clientsecret, which we’ll use later to authenticate our requests.Authenticating and Getting Access Token import requests
params = {“grant_type”: “client_credentials” }
resp = requests.post(f”{uua_url}/oauth/token”,
auth=(clientid, clientsecret),
params=params)
token = resp.json()[“access_token”] We’re using the requests library to make an HTTP POST request to a specified URL (uua_url) to obtain an access token for authentication. We’re sending along our client ID (clientid) and client secret (clientsecret) as basic authentication credentials (auth=(clientid, clientsecret)). The response (resp) contains JSON data, and we extract the access_token from it, which we’ll use in subsequent requests.Making a Model Inference Request deployment_url = ‘https://<server>/v2/inference/deployments/<deployment_id>’
inference_url = deployment_url + ‘/v2/generate’
# Set resource group and request headers
RESOURCE_GROUP = ‘language-models’
headers = {
‘Content-Type’: ‘application/json’,
‘AI-Resource-Group’: RESOURCE_GROUP,
‘Authorization’: f’Bearer {token}’
}
# Define parameters for model inference
max_tokens = 200
temperature = 0.5
top_k = 0
top_p = 0.9
# Create payload for model inference
payload = {
‘max_tokens’: max_tokens,
‘temperature’: temperature,
‘top_k’: top_k,
‘top_p’: top_p
}
# Make POST request to model inference endpoint
response = requests.post(inference_url, headers=headers, json=payload) Here, we’re setting up the URL (inference_url) for making a model inference request to a specific deployment (deployment_url). We then define some request headers (headers), including the access token obtained earlier (token). We also specify the parameters (max_tokens, temperature, top_k, top_p) for our model inference, which we encapsulate in a payload dictionary. Finally, we use requests.get to send a GET request to the inference_url with the specified headers and payload.Processing the Model Inference Response # Process response
if response.status_code == 200:
generated_text = response.json().get(‘generated_text’)
model_details = response.json().get(‘model_details’)
print(“Generated Text:”)
for line in generated_text:
print(line)
print(“nModel Details:”)
print(model_details)
else:
print(f”Error. Status code: {response.status_code}”)
print(response.text) This last section checks the response from our model inference request. If the status code (response.status_code) is 200 (which indicates a successful response), we extract and print the generated text (generated_text) and model details (model_details) from the JSON response. Otherwise, if the status code is not 200, we print an error message along with the status code and the response text for debugging purposes.And here’s the result:Of course, this is just a test to show you how to consume the API we just deployed. In the next blog post, when we compare the Shakespeare language model and the fine-tuned one, we’ll use Streamlit, which will allow us to make things even nicer. Anyway, I think we have enough for now!See you in the next blog post ?. Wrapping Up and Next StepsCongratulations on taking the first step into deploying AI models with SAP AI Core! In this blog, we explored how to deploy the Shakespeare Language Model using SAP AI Core and KServe.Let’s recap what we’ve covered:KServe Serving Template: We discovered how to design and implement a Serving Template in KServe to deploy machine learning models on Kubernetes.Deploying with SAP AI Core: We followed a detailed guide on deploying your model using SAP AI Core and AI Launchpad, ensuring seamless and scalable inference.Next StepsNow that we’ve deployed the Shakespearean Language Model, stay tuned for the final blog in this series, where we’ll deploy the Text Style Transfer fine tuned model and evaluate both performances.Sampling and Consuming Language Models: Discover methods for sampling from trained language models and integrating them into applications and developing a Language-Model-Based App to gain insights into building an application powered by your trained language model.[SAP AI Core is All You Need | 8. Consuming and Sampling from Shakespeare Language Models]Further ReferencesSource Code: GitHub repositorySAP AI Core HelpSAP AI LaunchpadKubernetesKserve Read More Technology Blogs by SAP articles
#SAP
#SAPTechnologyblog
+ There are no comments
Add yours