Introduction
Welcome back to our series “SAP AI Core is All You Need” ?.
In this blog, we’ll continue the journey with you into the world of AI and Language Models using SAP AI Core and SAP AI Launchpad. In this installment, “Setting the Stage for a Shakespeare-Language Model” we’re diving into the essential steps to ensure everything is operational for our Shakespearean Language Model deployment.
What to Expect
In this blog, you will gain hands-on experience with the following key concepts:
Setting Up SAP AI Core and Launchpad: Learn how to configure the environment to support our language model deployment.Defining Resource Groups: Understand how to create and manage resource groups in SAP AI Core to isolate resources and workloads.Configuring GitHub Repositories for YAML Files: Set up a GitHub repository to store and manage workflow definitions.Creating Docker Registry Secrets: Configure SAP AI Core to access Docker registries for storing images.Setting Up Object Store Secrets: Learn to create object store secrets for managing execution outputs and input files.Defining Input Artifacts: Specify where SAP AI Core can find the training dataset in your object store.Creating Generic Secrets: Store credentials for accessing AWS and preparing the training dataset.
Get ready to dive into the MLOps part of our project, making sure everything is operable and ready for deployment. By the end of this blog, you’ll have a solid foundation to support your language model’s lifecycle on SAP AI Core. Let’s get started!
Define Resource Group to Isolate Resources and Workloads
As you may know, SAP AI Core tenants use resource groups to isolate related ML resources and workloads. Scenarios, executables, and Docker registry secrets are shared across all resource groups. So, let’s begin by creating this isolated environment in our SAP AI Core Tenant. If you’re not sure how to get started, you can follow the steps outlined in “Use Boosters for Free Tier Use of SAP AI Core and SAP AI Launchpad” to set it up.
Assuming you’ve already done that, let’s open the SAP AI Launchpad.
Let’s head over to the SAP AI Core Administration tab and create a new resource group. Choose a name that suits your needs – it’s a crucial step for subsequent operations. For more details, check out the guide on how to Create a Resource Group.
You’ve got a bunch of tools at your disposal to handle SAP AI APIs like SAP AI Core SDK, SAP AI Launchpad, Postman, or Curl. But, hey, for this example, we’re keeping it nice and simple with SAP AI Launchpad. For those of you who’d rather skip the coding part, take a deep breath and relax! ?
Now that we’ve got our own little space in the tenant, let’s select it:
Setting Up a GitHub Repository for Your YAML Files
Next up, let’s sort out a GitHub Repository where we’ll store our YAML files. These files define how our workflows behave within SAP AI Core. Head over to “Git Repositories.”
To get started, you’ll need:
The Repository URL: Just copy it from GitHub, like https://github.com/<user>/<repository_name>A cool name for your repoYour GitHub usernameA Personal Access Token (PAT). Wanna know how to get one? Check out “Managing Your Personal Access Tokens“.
Hit “Create,” fill in the details, and you’re good to go!
Wait for the status to change from ‘PROGRESS’ to ‘COMPLETED’. If you’re eager and can’t wait, hit the refresh button. After that, you should see something like this:
Now, let’s give SAP AI Core a heads-up about our repo and set up watch loops to keep an eye on changes in those YAML files. To set this up, you’ll need:
Application Name: A catchy name for your applicationRepository: The repo you created earlierPath in Repository: The path within your repo where your template files will live, like “shakespeare/templates”
Revision: The revision (I know, the name might sound a bit odd) is set to HEAD. Why HEAD? It simply refers to the latest commit, ensuring we’re always working with the most up-to-date file version.
Want to see if it worked? Click on the application you just created.
Not to get ahead of ourselves, but let’s take a quick look at the scenarios. We’ll find a new one based on our training_template.yaml file.
There we have it! ? As we only have the training executable available, there are no serving ones present. Nevertheless, here we can see the executable ID, description, version, labels, and the parameters and artifacts that this scenario/executable uses. Here’s another point to note: we haven’t actually created anything yet, like Docker images or the code itself. However, we’re already specifying what we want the Argo agent in the Kubernetes cluster to do in the YAML file. Don’t worry, we’ll dive deeper into all of this in the upcoming blogs.
We’ll dig deeper into this later too, but for now, let’s keep the ball rolling!
Setting Up Docker Registry Secrets in SAP AI Core
The next step is to inform SAP AI Core about the Docker Registry we’ll use to store our images for execution and deployment. To do this, navigate to “Docker Registry Secrets”.
You’ll only need two pieces of information:
A nameA secret
It’s as simple as that. Now, moving forward. What? Need help finding the secret? If you’re using dockerhub like me, you can obtain it by going to Settings -> Security and then selecting “New Access Token”.
Once you have it, replace the following json template with your data:
{
“.dockerconfigjson”: “{“auths”:{“YOUR_DOCKER_REGISTRY_URL”:{“username”:”YOUR_DOCKER_USERNAME”,”password”:”YOUR_DOCKER_ACCESS_TOKEN”}}}”
}
The next step is to put this information in, and you’re all set! However, if you still are having hard time on that, check this out. There you’ll find everything you need to complete this task.
Setting Up Object Store Secrets
We’re just about done with the admin stuff; just a couple more things to wrap up. Let’s set up the object store secrets we need. We’ll have to create two secrets for this. The first one is mandatory as it needs to be named ‘default.’ The second one you can name the way you want. The ‘default’ secret tells SAP AI Core where to put its execution outputs, and the other one stores the inputs. Easy enough, huh?
Let’s walk through it.
To create the default secret, you’ll need the following information:
Resource Group: ‘language-models’ in my caseName: ‘default’Type: ‘S3’Path Prefix: This is the path in the S3 bucket where SAP AI Core will store the execution outputs. In my case, I’ve used ‘shakespeare/executions’.
To really get a handle on this topic, it helps to understand it better, right? So, we’ll need an S3 bucket, which you can set up using the SAP Object Store service on BTP.
Once you generate a key for this service, you’ll need some of that info to inspect the bucket (and for dealing with the bucket afterwards in our code).
{
“access_key_id”: “”,
“bucket”: “hcp-XXXXX”,
“host”: “s3-eu-central-1.amazonaws.com”,
“region”: “eu-central-1”,
“secret_access_key”: “”,
“uri”: “s3://XXXX:XXXX@s3-eu-central-1.amazonaws.com/hcp-XXXXX “,
“username”: “hcp-s3-XXXXX”
}
Alright, using your favorite command-line shell, make sure you Install or update to the latest version of the AWS CLI and Configure the AWS CLI using the “access_key_id” and “secret_access_key”. Like this:
By using this command you can see your bucket on AWS:
aws s3 ls s3://<YOUR_BUCKET_NAME>/
In my case, the output is:
Looking at the folder structure, you’ll notice a folder named “shakespeare” with two subfolders: “executions” (where SAP AI Core stores executions’ outputs) and “repository” (for input files like datasets, models, etc.). So, it’s clearer now that for the path prefix, we should use “shakespeare/executions”, right? At least, I hope that makes sense! ?
Now, let’s dive back into the settings:
Bucket Name, Endpoint, and Region: You can find these details in the key file I just showed you.
Verify SSL: Make sure to check this option to validate SSL/TLS certificates from the S3 server for authenticity.
Use HTTPS: Also, ensure this option is checked. Why? Well, it means the connection to the S3 object store will be made over HTTPS, which encrypts data transferred between SAP AI Core and the S3 server—secure stuff, right?
The Secret: It should be formatted as follows. And guess what? You already have this data from the key file once again.
{
“AWS_ACCESS_KEY_ID”: “<AWS access key ID>”,
“AWS_SECRET_ACCESS_KEY”: “<AWS secret access key>”
}
Wow, you made it through! Great job!
Remember we needed to create (at least) two object store secrets, right? Well, let’s create the second one by following the same steps. This time, we’ll make a few changes:
Name: Change the name to “shakespeare” instead of “default”.Path Prefix: Set the path prefix in S3 to “shakespeare/repository”. Remember, it’s just a prefix — nothing more (hint: it might start making sense when we create the artifacts).
Well done! If you got it right, you might see something like this:
Setting Up Input Artifacts for Model Training
To train our model, we’ll definitely need the input data, right? Whether it’s called the “training set”, “dataset”, or something else, the model requires this data to learn. But how does SAP AI Core know where those files are? That’s a great question. The answer lies in Artifacts (these are data or files produced or consumed by executions or deployments within SAP AI Core, managed through your connected object store).
Let’s create an input artifact to specify the path in S3 where the file will be located. First, choose the scenario:
Give it a cool name and description:
Let’s fill in the “Dataset URL.” This part can be a bit tricky if you’re not sure what you’re doing.
When you created the object store secrets (we made two: default and shakespeare), SAP sets up a placeholder that you can use to refer to the path prefix used in those secrets. For example:
Object store secret name = “shakespeare”Path prefix used in object store secret settings: “shakespeare/repository”
So, “ai://shakespeare/” is essentially the same as “ai://shakespeare/repository/”.
In other words, “ai://shakespeare” is just a placeholder for your bucket URL ?. Now, let’s set it up properly:
Which is pointing to the S3 path:
The “data” subfolder is where we’ll stage the tinyshakespeare.txt dataset later.
Setting Up Generic Secrets for Model Training
For many implementations, if you’ve completed the steps we’ve covered, you’ll be well-prepared to deploy AI models on BTP Kubernetes Cluster using AI Core and AI Launchpad. However, in our case, we have two specific steps for our training process:
The first step involves fetching the tiny Shakespeare dataset from a URL repository and writing it to the corresponding path in S3, which matches the input artifact we created earlier. Sure, you could manually copy it from your machine to S3, but we’ll automate this preparation step to boost productivity.
The second step is where the real training happens. It requires one input artifact and will output four new artifacts: model.pkl, vocab.json, merges.txt, and logs.log.
Now, for the first step, we’ll need to “Create a Generic Secret” to hold the credentials for logging into AWS and inserting the tiny Shakespeare dataset into the repository path we just created. However, you can’t just copy and paste the json there, the SAP AI API expects sensitive data to be Base64-encoded. So, for doing that, you might want to run the following code:
import base64
import json
class Base64Encoder:
@staticmethod
def encode(value: str) -> str:
“””Encodes a string value using Base64 encoding.”””
value_bytes = value.encode(‘utf-8’)
encoded_bytes = base64.b64encode(value_bytes)
return encoded_bytes.decode(‘utf-8’)
@staticmethod
def encode_values(**kwargs) -> dict:
“””Encodes multiple values using Base64 encoding.”””
encoded_values = {}
for key, value in kwargs.items():
if value is not None:
encoded_values[key] = Base64Encoder.encode(value)
return encoded_values
@staticmethod
def encode_values_to_json(**kwargs) -> str:
“””Encodes values using Base64 and returns JSON output.”””
encoded_values = Base64Encoder.encode_values(**kwargs)
return json.dumps(encoded_values)
Once you’ve implemented the above class, simply use it to obtain the JSON required for the Generic Secret configuration.
# Example usage:
access_key_id = “your_access_key_id”
bucket = “your_bucket_name”
path_prefix = “/path/prefix”
host = “optional_host”
region = “optional_region”
secret_access_key = “your_secret_access_key”
uri = “optional_uri”
username = “optional_username”
# Get encoded values as JSON string
encoded_json = Base64Encoder.encode_values_to_json(
access_key_id=access_key_id,
bucket=bucket,
path_prefix=path_prefix,
host=host,
region=region,
secret_access_key=secret_access_key,
uri=uri,
username=username
)
print(“Encoded Values (JSON):”)
print(encoded_json)
Just to clarify, you don’t need to include this code in the Shakespeare language model implementation. It’s specifically used to properly configure the Generic Secret, and that’s all there is to it, okay?
The output should look like this (note that I encoded random text for each value only for educational purposes, they’re not the real values for my S3 connection):
{
“access_key_id”: “eW91cl9hY2Nlc3Nfa2V5X2lk”,
“bucket”: “eW91cl9idWNrZXRfbmFtZQ==”,
“path_prefix”: “L3BhdGgvcHJlZml4”,
“host”: “b3B0aW9uYWxfaG9zdA==”,
“region”: “b3B0aW9uYWxfcmVnaW9u”,
“secret_access_key”: “eW91cl9zZWNyZXRfYWNjZXNzX2tleQ==”,
“uri”: “b3B0aW9uYWxfdXJp”,
“username”: “b3B0aW9uYWxfdXNlcm5hbWU=”
}
Next, to create the generic secret, you’ll need:
Resource Group: “language-models” in our case.Name: “object-store-credentials” for example.Secret: the json content we’ve just generated.
… and that’s all! You’ve made to the end of all the administrative tasks we should perform to implement our Shakespeare Language Model.
One last thing: you might remember that we needed two steps to perform the training, right? Well, how do we handle multi-step workflows then? That’s a great question. One option is to use metaflow (though it’s beyond the scope of this series – check out how it can be done on Train your model in SAP AI Core using the Metaflow-Argo plugin by @KarimM ). Another approach is to use Argo Workflow templates which is the one we’re going to use.
And that’s it for now! Let’s wrap this up, take a break, and see what comes next.
Wrapping Up and Next Steps
Congratulations on setting the stage for deploying your Shakespearean Language Model! In this blog, we’ve tackled the foundational steps to ensure everything is ready for deployment using SAP AI Core and SAP AI Launchpad.
Let’s recap what we’ve covered:
Setting Up SAP AI Core and Launchpad: We configured the environment to support our language model deployment.Defining Resource Groups: We created and managed resource groups in SAP AI Core to isolate resources and workloads.Configuring GitHub Repositories for YAML Files: We set up a GitHub repository to store and manage workflow definitions.Creating Docker Registry Secrets: We configured SAP AI Core to access Docker registries for storing images.Setting Up Object Store Secrets: We created object store secrets for managing execution outputs and input files.Defining Input Artifacts: We specified where SAP AI Core can find the training dataset in your object store.Creating Generic Secrets: We stored credentials for accessing AWS and preparing the training dataset.
Next Steps
Now that we’ve laid the groundwork for a successful language model deployment, stay tuned for the upcoming blogs in this series, where we’ll explore further steps and functionalities:
Deploying the Training Pipeline: Learn how to deploy the training pipeline using Argo multi-step workflows with SAP AI Core. We’ll cover setting up and orchestrating training jobs efficiently.
[SAP AI Core is All You Need | 3. Workflow, Configuration, and Shakespeare Language Model Training]Improving Model Training Efficiency: Understand how to use checkpointing and resuming to make model training more efficient.
[SAP AI Core is All You Need | 4. Improving Model Training Efficiency with Checkpointing/Resuming]Fine-Tuning with Low-Rank Adaptation (LoRA): Learn how to use LoRA to fine-tune models with fewer parameters, making the process more efficient and effective.
[SAP AI Core is All You Need | 5. Fine Tuning with Low-Rank Adaptation (LoRA)]Fine-Tuning Pipeline: Dive into fine-tuning techniques to enhance model performance on specific datasets or tasks. We’ll explore the deployment of fine-tuning pipelines using SAP AI Core and explore model deployment and serving using KServe with SAP AI Core. Learn how to efficiently serve fine-tuned models for real-world applications.
[SAP AI Core is All You Need | 6. Serving Shakespeare Model using SAP AI Core and KServe]Sampling and Consuming Language Models: Discover methods for sampling from trained language models and integrating them into applications.
[SAP AI Core is All You Need | 7. Deploying Language Models for Text Generation]Developing a Language-Model-Based App: Gain insights into building an application powered by your trained language model.
[SAP AI Core is All You Need | 8. Consuming and Sampling from Shakespeare Language Models]
Further References
Source Code: GitHub repositorySAP AI Core HelpSAP AI Launchpad HelpDocker HubArgo WorkflowsMetaflow library for SAP AI Core
IntroductionWelcome back to our series “SAP AI Core is All You Need” ?.In this blog, we’ll continue the journey with you into the world of AI and Language Models using SAP AI Core and SAP AI Launchpad. In this installment, “Setting the Stage for a Shakespeare-Language Model” we’re diving into the essential steps to ensure everything is operational for our Shakespearean Language Model deployment. What to ExpectIn this blog, you will gain hands-on experience with the following key concepts:Setting Up SAP AI Core and Launchpad: Learn how to configure the environment to support our language model deployment.Defining Resource Groups: Understand how to create and manage resource groups in SAP AI Core to isolate resources and workloads.Configuring GitHub Repositories for YAML Files: Set up a GitHub repository to store and manage workflow definitions.Creating Docker Registry Secrets: Configure SAP AI Core to access Docker registries for storing images.Setting Up Object Store Secrets: Learn to create object store secrets for managing execution outputs and input files.Defining Input Artifacts: Specify where SAP AI Core can find the training dataset in your object store.Creating Generic Secrets: Store credentials for accessing AWS and preparing the training dataset.Get ready to dive into the MLOps part of our project, making sure everything is operable and ready for deployment. By the end of this blog, you’ll have a solid foundation to support your language model’s lifecycle on SAP AI Core. Let’s get started! Define Resource Group to Isolate Resources and WorkloadsAs you may know, SAP AI Core tenants use resource groups to isolate related ML resources and workloads. Scenarios, executables, and Docker registry secrets are shared across all resource groups. So, let’s begin by creating this isolated environment in our SAP AI Core Tenant. If you’re not sure how to get started, you can follow the steps outlined in “Use Boosters for Free Tier Use of SAP AI Core and SAP AI Launchpad” to set it up.Assuming you’ve already done that, let’s open the SAP AI Launchpad.Let’s head over to the SAP AI Core Administration tab and create a new resource group. Choose a name that suits your needs – it’s a crucial step for subsequent operations. For more details, check out the guide on how to Create a Resource Group.You’ve got a bunch of tools at your disposal to handle SAP AI APIs like SAP AI Core SDK, SAP AI Launchpad, Postman, or Curl. But, hey, for this example, we’re keeping it nice and simple with SAP AI Launchpad. For those of you who’d rather skip the coding part, take a deep breath and relax! ?Now that we’ve got our own little space in the tenant, let’s select it: Setting Up a GitHub Repository for Your YAML FilesNext up, let’s sort out a GitHub Repository where we’ll store our YAML files. These files define how our workflows behave within SAP AI Core. Head over to “Git Repositories.”To get started, you’ll need:The Repository URL: Just copy it from GitHub, like https://github.com/<user>/<repository_name>A cool name for your repoYour GitHub usernameA Personal Access Token (PAT). Wanna know how to get one? Check out “Managing Your Personal Access Tokens”.Hit “Create,” fill in the details, and you’re good to go!Wait for the status to change from ‘PROGRESS’ to ‘COMPLETED’. If you’re eager and can’t wait, hit the refresh button. After that, you should see something like this:Now, let’s give SAP AI Core a heads-up about our repo and set up watch loops to keep an eye on changes in those YAML files. To set this up, you’ll need:Application Name: A catchy name for your applicationRepository: The repo you created earlierPath in Repository: The path within your repo where your template files will live, like “shakespeare/templates”Revision: The revision (I know, the name might sound a bit odd) is set to HEAD. Why HEAD? It simply refers to the latest commit, ensuring we’re always working with the most up-to-date file version. Want to see if it worked? Click on the application you just created.Not to get ahead of ourselves, but let’s take a quick look at the scenarios. We’ll find a new one based on our training_template.yaml file.There we have it! ? As we only have the training executable available, there are no serving ones present. Nevertheless, here we can see the executable ID, description, version, labels, and the parameters and artifacts that this scenario/executable uses. Here’s another point to note: we haven’t actually created anything yet, like Docker images or the code itself. However, we’re already specifying what we want the Argo agent in the Kubernetes cluster to do in the YAML file. Don’t worry, we’ll dive deeper into all of this in the upcoming blogs.We’ll dig deeper into this later too, but for now, let’s keep the ball rolling!Setting Up Docker Registry Secrets in SAP AI CoreThe next step is to inform SAP AI Core about the Docker Registry we’ll use to store our images for execution and deployment. To do this, navigate to “Docker Registry Secrets”.You’ll only need two pieces of information:A nameA secretIt’s as simple as that. Now, moving forward. What? Need help finding the secret? If you’re using dockerhub like me, you can obtain it by going to Settings -> Security and then selecting “New Access Token”.Once you have it, replace the following json template with your data: {
“.dockerconfigjson”: “{“auths”:{“YOUR_DOCKER_REGISTRY_URL”:{“username”:”YOUR_DOCKER_USERNAME”,”password”:”YOUR_DOCKER_ACCESS_TOKEN”}}}”
} The next step is to put this information in, and you’re all set! However, if you still are having hard time on that, check this out. There you’ll find everything you need to complete this task. Setting Up Object Store SecretsWe’re just about done with the admin stuff; just a couple more things to wrap up. Let’s set up the object store secrets we need. We’ll have to create two secrets for this. The first one is mandatory as it needs to be named ‘default.’ The second one you can name the way you want. The ‘default’ secret tells SAP AI Core where to put its execution outputs, and the other one stores the inputs. Easy enough, huh?Let’s walk through it.To create the default secret, you’ll need the following information:Resource Group: ‘language-models’ in my caseName: ‘default’Type: ‘S3’Path Prefix: This is the path in the S3 bucket where SAP AI Core will store the execution outputs. In my case, I’ve used ‘shakespeare/executions’.To really get a handle on this topic, it helps to understand it better, right? So, we’ll need an S3 bucket, which you can set up using the SAP Object Store service on BTP.Once you generate a key for this service, you’ll need some of that info to inspect the bucket (and for dealing with the bucket afterwards in our code). {
“access_key_id”: “”,
“bucket”: “hcp-XXXXX”,
“host”: “s3-eu-central-1.amazonaws.com”,
“region”: “eu-central-1”,
“secret_access_key”: “”,
“uri”: “s3://XXXX:XXXX@s3-eu-central-1.amazonaws.com/hcp-XXXXX “,
“username”: “hcp-s3-XXXXX”
} Alright, using your favorite command-line shell, make sure you Install or update to the latest version of the AWS CLI and Configure the AWS CLI using the “access_key_id” and “secret_access_key”. Like this:By using this command you can see your bucket on AWS: aws s3 ls s3://<YOUR_BUCKET_NAME>/ In my case, the output is:Looking at the folder structure, you’ll notice a folder named “shakespeare” with two subfolders: “executions” (where SAP AI Core stores executions’ outputs) and “repository” (for input files like datasets, models, etc.). So, it’s clearer now that for the path prefix, we should use “shakespeare/executions”, right? At least, I hope that makes sense! ?Now, let’s dive back into the settings:Bucket Name, Endpoint, and Region: You can find these details in the key file I just showed you.Verify SSL: Make sure to check this option to validate SSL/TLS certificates from the S3 server for authenticity.Use HTTPS: Also, ensure this option is checked. Why? Well, it means the connection to the S3 object store will be made over HTTPS, which encrypts data transferred between SAP AI Core and the S3 server—secure stuff, right?The Secret: It should be formatted as follows. And guess what? You already have this data from the key file once again. {
“AWS_ACCESS_KEY_ID”: “<AWS access key ID>”,
“AWS_SECRET_ACCESS_KEY”: “<AWS secret access key>”
} Wow, you made it through! Great job!Remember we needed to create (at least) two object store secrets, right? Well, let’s create the second one by following the same steps. This time, we’ll make a few changes:Name: Change the name to “shakespeare” instead of “default”.Path Prefix: Set the path prefix in S3 to “shakespeare/repository”. Remember, it’s just a prefix — nothing more (hint: it might start making sense when we create the artifacts).Well done! If you got it right, you might see something like this: Setting Up Input Artifacts for Model TrainingTo train our model, we’ll definitely need the input data, right? Whether it’s called the “training set”, “dataset”, or something else, the model requires this data to learn. But how does SAP AI Core know where those files are? That’s a great question. The answer lies in Artifacts (these are data or files produced or consumed by executions or deployments within SAP AI Core, managed through your connected object store).Let’s create an input artifact to specify the path in S3 where the file will be located. First, choose the scenario: Give it a cool name and description:Let’s fill in the “Dataset URL.” This part can be a bit tricky if you’re not sure what you’re doing.When you created the object store secrets (we made two: default and shakespeare), SAP sets up a placeholder that you can use to refer to the path prefix used in those secrets. For example:Object store secret name = “shakespeare”Path prefix used in object store secret settings: “shakespeare/repository”So, “ai://shakespeare/” is essentially the same as “ai://shakespeare/repository/”.In other words, “ai://shakespeare” is just a placeholder for your bucket URL ?. Now, let’s set it up properly: Which is pointing to the S3 path: The “data” subfolder is where we’ll stage the tinyshakespeare.txt dataset later. Setting Up Generic Secrets for Model TrainingFor many implementations, if you’ve completed the steps we’ve covered, you’ll be well-prepared to deploy AI models on BTP Kubernetes Cluster using AI Core and AI Launchpad. However, in our case, we have two specific steps for our training process:The first step involves fetching the tiny Shakespeare dataset from a URL repository and writing it to the corresponding path in S3, which matches the input artifact we created earlier. Sure, you could manually copy it from your machine to S3, but we’ll automate this preparation step to boost productivity.The second step is where the real training happens. It requires one input artifact and will output four new artifacts: model.pkl, vocab.json, merges.txt, and logs.log.Now, for the first step, we’ll need to “Create a Generic Secret” to hold the credentials for logging into AWS and inserting the tiny Shakespeare dataset into the repository path we just created. However, you can’t just copy and paste the json there, the SAP AI API expects sensitive data to be Base64-encoded. So, for doing that, you might want to run the following code: import base64
import json
class Base64Encoder:
@staticmethod
def encode(value: str) -> str:
“””Encodes a string value using Base64 encoding.”””
value_bytes = value.encode(‘utf-8’)
encoded_bytes = base64.b64encode(value_bytes)
return encoded_bytes.decode(‘utf-8’)
@staticmethod
def encode_values(**kwargs) -> dict:
“””Encodes multiple values using Base64 encoding.”””
encoded_values = {}
for key, value in kwargs.items():
if value is not None:
encoded_values[key] = Base64Encoder.encode(value)
return encoded_values
@staticmethod
def encode_values_to_json(**kwargs) -> str:
“””Encodes values using Base64 and returns JSON output.”””
encoded_values = Base64Encoder.encode_values(**kwargs)
return json.dumps(encoded_values) Once you’ve implemented the above class, simply use it to obtain the JSON required for the Generic Secret configuration. # Example usage:
access_key_id = “your_access_key_id”
bucket = “your_bucket_name”
path_prefix = “/path/prefix”
host = “optional_host”
region = “optional_region”
secret_access_key = “your_secret_access_key”
uri = “optional_uri”
username = “optional_username”
# Get encoded values as JSON string
encoded_json = Base64Encoder.encode_values_to_json(
access_key_id=access_key_id,
bucket=bucket,
path_prefix=path_prefix,
host=host,
region=region,
secret_access_key=secret_access_key,
uri=uri,
username=username
)
print(“Encoded Values (JSON):”)
print(encoded_json) Just to clarify, you don’t need to include this code in the Shakespeare language model implementation. It’s specifically used to properly configure the Generic Secret, and that’s all there is to it, okay? The output should look like this (note that I encoded random text for each value only for educational purposes, they’re not the real values for my S3 connection): {
“access_key_id”: “eW91cl9hY2Nlc3Nfa2V5X2lk”,
“bucket”: “eW91cl9idWNrZXRfbmFtZQ==”,
“path_prefix”: “L3BhdGgvcHJlZml4”,
“host”: “b3B0aW9uYWxfaG9zdA==”,
“region”: “b3B0aW9uYWxfcmVnaW9u”,
“secret_access_key”: “eW91cl9zZWNyZXRfYWNjZXNzX2tleQ==”,
“uri”: “b3B0aW9uYWxfdXJp”,
“username”: “b3B0aW9uYWxfdXNlcm5hbWU=”
} Next, to create the generic secret, you’ll need:Resource Group: “language-models” in our case.Name: “object-store-credentials” for example.Secret: the json content we’ve just generated. … and that’s all! You’ve made to the end of all the administrative tasks we should perform to implement our Shakespeare Language Model. One last thing: you might remember that we needed two steps to perform the training, right? Well, how do we handle multi-step workflows then? That’s a great question. One option is to use metaflow (though it’s beyond the scope of this series – check out how it can be done on Train your model in SAP AI Core using the Metaflow-Argo plugin by @KarimM ). Another approach is to use Argo Workflow templates which is the one we’re going to use.And that’s it for now! Let’s wrap this up, take a break, and see what comes next.Wrapping Up and Next StepsCongratulations on setting the stage for deploying your Shakespearean Language Model! In this blog, we’ve tackled the foundational steps to ensure everything is ready for deployment using SAP AI Core and SAP AI Launchpad.Let’s recap what we’ve covered:Setting Up SAP AI Core and Launchpad: We configured the environment to support our language model deployment.Defining Resource Groups: We created and managed resource groups in SAP AI Core to isolate resources and workloads.Configuring GitHub Repositories for YAML Files: We set up a GitHub repository to store and manage workflow definitions.Creating Docker Registry Secrets: We configured SAP AI Core to access Docker registries for storing images.Setting Up Object Store Secrets: We created object store secrets for managing execution outputs and input files.Defining Input Artifacts: We specified where SAP AI Core can find the training dataset in your object store.Creating Generic Secrets: We stored credentials for accessing AWS and preparing the training dataset.Next StepsNow that we’ve laid the groundwork for a successful language model deployment, stay tuned for the upcoming blogs in this series, where we’ll explore further steps and functionalities:Deploying the Training Pipeline: Learn how to deploy the training pipeline using Argo multi-step workflows with SAP AI Core. We’ll cover setting up and orchestrating training jobs efficiently. [SAP AI Core is All You Need | 3. Workflow, Configuration, and Shakespeare Language Model Training]Improving Model Training Efficiency: Understand how to use checkpointing and resuming to make model training more efficient.[SAP AI Core is All You Need | 4. Improving Model Training Efficiency with Checkpointing/Resuming]Fine-Tuning with Low-Rank Adaptation (LoRA): Learn how to use LoRA to fine-tune models with fewer parameters, making the process more efficient and effective.[SAP AI Core is All You Need | 5. Fine Tuning with Low-Rank Adaptation (LoRA)]Fine-Tuning Pipeline: Dive into fine-tuning techniques to enhance model performance on specific datasets or tasks. We’ll explore the deployment of fine-tuning pipelines using SAP AI Core and explore model deployment and serving using KServe with SAP AI Core. Learn how to efficiently serve fine-tuned models for real-world applications.[SAP AI Core is All You Need | 6. Serving Shakespeare Model using SAP AI Core and KServe]Sampling and Consuming Language Models: Discover methods for sampling from trained language models and integrating them into applications.[SAP AI Core is All You Need | 7. Deploying Language Models for Text Generation]Developing a Language-Model-Based App: Gain insights into building an application powered by your trained language model.[SAP AI Core is All You Need | 8. Consuming and Sampling from Shakespeare Language Models] Further ReferencesSource Code: GitHub repositorySAP AI Core HelpSAP AI Launchpad HelpDocker HubArgo WorkflowsMetaflow library for SAP AI Core Read More Technology Blogs by SAP articles
#SAP
#SAPTechnologyblog
+ There are no comments
Add yours