Hands-on Tutorial (SAP) Databricks triggering ML in SAP Datasphere

Estimated read time 26 min read

Did you know that SAP Databricks and native Databricks can trigger Machine Learning in SAP Datasphere? Get a Databricks Notebook to trigger SAP’s Predictive Analysis Library. No data needs to be moved around.

 

 

SAP Databricks is a Data Science environment within SAP Business Data Cloud. It is primarily intended to access data products from SAP Business Data Cloud for you to build your own custom AI/ML models with its built-in Databricks serverless compute. However, you can also use it to instruct SAP Datasphere to process its data and to create Machine Learning forecasts with SAP HANA Cloud’s built-in Machine Learning algorithms. Let’s try it out together. 

Note that the data and code /can be downloaded from this repository:
Hands-on Tutorial (SAP) Databricks triggering ML in SAP Datasphere

Table of contents 

 

Use Case

The use case is inspired by a requirement a logistics customer brought up recently. They keep granular data in SAP Datasphere about the items that shipped in the past. One row of data per shipment, about 80 million records. This data should now be used to create a forecast of how many items are expected in the upcoming days. These estimates will feed into their workforce planning, to have the appropriate number of staff at hand.

Well, for implementing this scenario I have no data from the real world of millions of parcels being processed. The next best data that I could find for this hands-on example is a detailed count of vehicles in Zurich, Switzerland. The city has a number of measurement stations across the town that provide an hourly count of how many vehicles drove by. So instead of predicting the number of expected parcels, we will predict the number of vehicles. The concept is technically the same. The detailed data (3 million records in the Zurich data) needs to be explored, prepared, aggregated and then used as basis for a prediction.

 

Architecture

Let’s keep things simple for this example, and we will create the forecast with a time-series algorithm. The daily count of vehicles that were counted is used to predict how many vehicles (or parcels) are expected in the next few days. The data is stored in a table in SAP Datasphere. 

SAP Databricks is used as development environment to script the Python code, that instructs SAP HANA Cloud within SAP Datasphere to create that prediction and to save it to a table in SAP Datasphere. Instead of SAP Databricks you can also follow this tutorial with a native instance of Databricks.

You will see in the steps below, how these instructions are created. But to give you the big picture right away: SAP’s Python package hana_ml is used in SAP Databricks to create the SQL syntax that is sent to SAP HANA Cloud to trigger its embedded Predictive Analysis Library.

These links might help to find out more about the Predictive Analysis Library. It contains 100+ different algorithms, all implemented by SAP to make best use of SAP HANA Cloud’s architecture.

Course: Developing Regression Models with the Python Machine Learning Client for SAP HANADocumentation: Python Machine Learning Client for SAP HANA (hana_ml)Documentation: Predictive Analysis Library 

 

Prerequisites

To follow this tutorial hands-on you need a SAP Business Data Cloud environment with SAP Datasphere and SAP Databricks. Some familiarity with SAP Databricks would help, for instance having gone through this tutorial, which can be implemented with the free trial of SAP Business Data Cloud.

 

SAP Datasphere Script Server

SAP Datasphere must have the Script Server enabled, so that the Predictive Analysis Library can be used, please see SAP Note 3216010. The Script Server is only available if your system has at least 3 virtual CPUs. Only the very smallest systems are running with 2 vCPUs. If this should apply to your environment, increasing the memory will get you there.

 

SAP Datasphere Database User

The Python code in SAP Databricks will connect to SAP Datasphere with a Database User. To create this user go in SAP Datasphere to “Space Management”, edit your space and in the “Database Users” section create that user. Give it a name and tick the options in the screenshot. These settings ensure that the user can access the data and write predictions to a table.

 

Click the “info” icon on the right and the credentials show up. You will need

The Database User name (including the name of the space followed by ‘#’)The Host nameThe Port (always 443)And the password (the password is only shown once. If you lose it, you need to request a new one)

 

Data upload

To keep things focussed, I have created a CSV file with detailed vehicle counts that you can upload as a single file. In this example, we ignore some important facts that you would need to consider in a real project. For instance the number of measurement stations has been changing over time. We will just take the data as it is, so that we can focus on the end-to-end tutorial without getting lost in the nitty gritty details. After all, the vehicle data is just an example. In reality we should have a clean and full history of all parcels.

Download the historic data and upload the unzipped file in the Data Builder with “Import CSV file”. 

 

Deploy the table with the suggested name VEHICLECOUNT. And we have data. Each location might have one or more sensors. For each sensor we have hourly measurements. 

 

Currently the Database User cannot yet access the table though. You just need to put a view on top of the table, which has the option “Expose for Consumption” activated. In the little screencam you see that some records have NULL values for the vehicle count. This is a data quality issue from the raw data provided by the city. The sensors might not have been working at that time. We will ignore that in this tutorial. In the real world you may want to look into this as part of the data preparation, maybe by imputing the missing data..

 

 

Demand forecast  

All code that is described in this tutorial is part of this repository. Just download and import the DBC archive file into SAP Databricks.

 

You now have a project called “Logistics forecast”, which contains two notebooks:

Store Database User CredentialsLogistics forecast with HANA ML

 

Store Database User Credentials

Continue by putting your Database User’s credentials in a safe place.  Instead of hardcoding them into each notebook, store them centrally as Databricks secret. Open the notebook “Store Database User Credentials”. Enter your own Database User’s credentials in the code and run the cells. 

dbuser_address = “yourdbuseraddress”
dbuser_user = “yourdbuser”
dbuser_password = “yourdbuserpassword”

from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
scope = “sapdatasphere_dbuser”
w.secrets.create_scope(scope)
w.secrets.put_secret(scope,”address”,string_value=dbuser_address)
w.secrets.put_secret(scope,”user”,string_value=dbuser_user)
w.secrets.put_secret(scope,”password”,string_value=dbuser_password)

Each notebook can now retrieve the credentials from that secret. And to the best of my knowledge, SAP Databricks keeps the values so safe, that you cannot even seem them in clear text yourself. You can try to print them for instance, but it will only show “[REDACTED]” 

 

Connection to SAP Datasphere

Now open the notebook “Logistics forecast with HANA ML”. All code that we need for the forecasting is already in that file. In the ‘Environment’-tab on the right, you see that the Python package hana_ml has been added. Clear the Notebook’s output, to be sure that any future output you see, was created by yourself.

 

Run the first cells to retrieve the Database User credentials from the Databricks secret. Use these to establish a connection to SAP Datasphere.

dbuser_address = dbutils.secrets.get(scope=”sapdatasphere_dbuser”, key=”address”)
dbuser_user = dbutils.secrets.get(scope=”sapdatasphere_dbuser”, key=”user”)
dbuser_password = dbutils.secrets.get(scope=”sapdatasphere_dbuser”, key=”password”)

import hana_ml.dataframe as dataframe
conn = dataframe.ConnectionContext(address=dbuser_address,
port=443,
user=dbuser_user,
password=dbuser_password,
)
conn.connection.isconnected()

The isconnected() function should return “True”.

If you are getting an error that the hana_ml package is missing, go back to the ‘Environment’-tab and hit “Apply”. This will install the package.If you are getting a Connection failed error “Socket closed by peer”, then most likely the Notebook’s external IP address hasn’t been added yet to SAP Datasphere’s allow list. You can see the IP address with this command. from requests import get
ip = get(‘https://api.ipify.org’).text
print(f’The public IP address is: {ip}’)

Then add this address to the allow list as described in the documentation. Currently (February ’26) Databricks has a feature in Preview mode to get a static IP address. You can reach out to your Databricks account manager to request having this feature activated in your environment. In case you don’t know that person, feel free to ping me and I will try to make a connection.

 

Data Exploration

Once you are connected, point the hana_ml DataFrame to the Datasphere view you had created earlier. This does not extract any data from SAP Datasphere. The 3 million records remain in place. But you can download rows if you like. Just select 5 rows, to verify the connection does indeed work as expected. Then call a canned report for further data understanding. The details are also calculated within SAP HANA Cloud.

current_schema = conn.get_current_schema().split(‘#’)[0]
data_hdf = conn.table(‘V_VEHICLECOUNT’, schema=current_schema)
data_hdf.head(5).collect()from hana_ml.visualizers.unified_report import UnifiedReport
UnifiedReport(data_hdf).build().display()

 

Data Preparation

We would like a daily forecast, but our data is more detailed with hourly timestamps. Hence create a new column that contains only the date, cutting off the hourly information. This column is not persisted, it does not change the underlying view. It is calculating the new column on the fly, when accessing the data.

data_hdf = data_hdf.select(‘*’, (‘TO_DATE(MEASUREDATETIME)’, ‘MEASUREDATE’))
data_hdf.head(5).collect()

 

Having the MEASUREDATE column make it easy to summarise the measurements by date.

data_hdf = data_hdf.agg([(‘sum’, ‘VEHICLECOUNT’, ‘VEHICLECOUNT’)], group_by=’MEASUREDATE’)
data_hdf = data_hdf.sort_values(‘MEASUREDATE’)
data_hdf.head(5).collect()

 

All aggregations were done in SAP Datasphere / SAP HANA Cloud. Under the hood the aggregation request was translated by hana_ml into the corresponding SQL syntax, which was sent to SAP HANA Cloud. You can see the SELECT statement that was executed.

data_hdf.select_statement

 

Plot the daily values. The collect() function downloads the aggregated data as Pandas DataFrame. There seems to be a trend that more vehicles are counted. This could be because of increased traffic, or an increased number of sensors. For this tutorial we won’t look any deeper into this. But a weekly pattern seems to be there. A drop over Christmas. And almost no vehicles were counted for a few days in July 2024. That’s the data quality issue that was mentioned earlier, the sensors might not have been working. We will ignore that too for this tutorial.

 

Time series forecasting

Split the data into training and testing. We want a 7-day forecast. Hence put the most recent 7 days aside, to get a feel for the forecasting accuracy.

train_hdf = data_hdf.head(data_hdf.count()-7)
test_hdf = data_hdf.tail(7)

 

Train a time-series model on the training data. The AdditiveModelForecast algorithm is an implementation of Prophet. For simplicity no hyperparameter tuning is done here.

from hana_ml.algorithms.pal.tsa.additive_model_forecast import AdditiveModelForecast
amf = AdditiveModelForecast()
amf.fit(data=train_hdf, key=’MEASUREDATE’)

 

Investigate the model in a canned report.

from hana_ml.visualizers.unified_report import UnifiedReport
UnifiedReport(amf).build().display()

 

Apply the model on the test data, that was put aside earlier.

predicted_hdf = amf.predict(data=test_hdf)
predicted_hdf.head(5).collect()

 

Join the true values from the past with the prediction to assess the forecast accuracy.

predicted_hdf = predicted_hdf.set_index(‘MEASUREDATE’).join(test_hdf.set_index(‘MEASUREDATE’))
predicted_hdf.head(5).collect()

 

Compare the actuals and prediction in a plot. The forecast is more accurate, than it might appear at first. Note how the y-axis is not starting at zero, but at 1.2 million. Hence the actuals and predictions are quite close.

import pandas as pd
predicted_df = predicted_hdf.collect()
predicted_df[“VEHICLECOUNT”] = pd.to_numeric(predicted_df[“VEHICLECOUNT”])
fig = px.line(predicted_df, x=”MEASUREDATE”, y=[“VEHICLECOUNT”,”YHAT”] , title=’Vehicle Count Forecast Accuracy test on known past’)
fig.show()

 

Calculate the accuracy of the prediction. A Mean Absolute Percentage Error of 3.3% seems quite all right.

from hana_ml.algorithms.pal.tsa.accuracy_measure import accuracy_measure
metric_names = [‘mape’, ‘rmse’]
metrics = accuracy_measure(predicted_hdf.select([‘VEHICLECOUNT’, ‘YHAT’]), evaluation_metric=metric_names).collect()
metrics

 

Databricks contains a very comprehensive Repository where such model metrics (or hyperparameters, images, etc…) can be saved. Here we store just the absolute basics. For a much more advanced approach see this blog  “Developing HANA ML models with SAP Databricks” by @nidhi_sawhney.

import mlflow
with mlflow.start_run():
for index, single_metric in metrics.iterrows():
mlflow.log_metric(single_metric.STAT_NAME, single_metric.STAT_VALUE)

 

Now that we have a feeling for how well the model worked on the recent past, train a new model on the full history of the data.

from hana_ml.algorithms.pal.tsa.additive_model_forecast import AdditiveModelForecast
amf = AdditiveModelForecast()
amf.fit(data=data_hdf, key=’MEASUREDATE’)

 

Obtain list of the future 7 days that we want to predict. The historic data finishes on 18 January 2026.

futuredatestopredict_hdf = amf.make_future_dataframe(data_hdf, key=’MEASUREDATE’, periods=7)
futuredatestopredict_hdf.head(10).collect()

 

 

Apply the model to predict these future dates.

predictedfuture_hdf = amf.predict(data=futuredatestopredict_hdf, key=’MEASUREDATE’)
predictedfuture_hdf.head(10).collect()

 

Turn any potential negative predictions to zero. For this use case negative values would not make sense. Even though this very forecast doesn’t have any negative values, you might need this step in your own projects. 

predictedfuture_hdf = predictedfuture_hdf.select(‘MEASUREDATE’, (‘GREATEST(YHAT, 0)’, ‘YHAT’),
(‘GREATEST(YHAT_LOWER, 0)’, ‘YHAT_LOWER’),
(‘GREATEST(YHAT_UPPER, 0)’, ‘YHAT_UPPER’))
predictedfuture_hdf.head(10).collect()

 

You can rename the columns, if you prefer, before writing the forecast to a table.

predictedfuture_hdf = predictedfuture_hdf.rename_columns([‘MEASUREDATE’, ‘VEHICLECOUNT_PRED’, ‘VEHICLECOUNT_PREDUPPER’, ‘VEHICLECOUNT_PREDLOWER’])
predictedfuture_hdf.head(10).collect()

 

And with that, save the predictions as table to SAP Datasphere / SAP HANA Cloud.

predictedfuture_hdf.save(“LOGISTICS_FORECAST”, force=True)

 

Using the forecast in SAP Datasphere

The forecast is created and saved to a table. You can now integrate the information into your modelling and share with your users through SAP Analytics Cloud, or any other way you might prefer.

 

Summary

Now know in detail how SAP Databricks and native Databricks can trigger Machine Learning in SAP Datasphere. It’s an option to consider to avoid data movement. SAP HANA Cloud has a long list of built-in algorithms. If there is one specific algorithm you are missing, just get in touch. Maybe it is there under a different name. If it really isn’t there, get in touch to check with the Product Group whether it can be implemented.

Happy predicting!

 

​ Did you know that SAP Databricks and native Databricks can trigger Machine Learning in SAP Datasphere? Get a Databricks Notebook to trigger SAP’s Predictive Analysis Library. No data needs to be moved around.  SAP Databricks is a Data Science environment within SAP Business Data Cloud. It is primarily intended to access data products from SAP Business Data Cloud for you to build your own custom AI/ML models with its built-in Databricks serverless compute. However, you can also use it to instruct SAP Datasphere to process its data and to create Machine Learning forecasts with SAP HANA Cloud’s built-in Machine Learning algorithms. Let’s try it out together. Note that the data and code /can be downloaded from this repository:Hands-on Tutorial (SAP) Databricks triggering ML in SAP DatasphereTable of contents  Use CaseThe use case is inspired by a requirement a logistics customer brought up recently. They keep granular data in SAP Datasphere about the items that shipped in the past. One row of data per shipment, about 80 million records. This data should now be used to create a forecast of how many items are expected in the upcoming days. These estimates will feed into their workforce planning, to have the appropriate number of staff at hand.Well, for implementing this scenario I have no data from the real world of millions of parcels being processed. The next best data that I could find for this hands-on example is a detailed count of vehicles in Zurich, Switzerland. The city has a number of measurement stations across the town that provide an hourly count of how many vehicles drove by. So instead of predicting the number of expected parcels, we will predict the number of vehicles. The concept is technically the same. The detailed data (3 million records in the Zurich data) needs to be explored, prepared, aggregated and then used as basis for a prediction. ArchitectureLet’s keep things simple for this example, and we will create the forecast with a time-series algorithm. The daily count of vehicles that were counted is used to predict how many vehicles (or parcels) are expected in the next few days. The data is stored in a table in SAP Datasphere. SAP Databricks is used as development environment to script the Python code, that instructs SAP HANA Cloud within SAP Datasphere to create that prediction and to save it to a table in SAP Datasphere. Instead of SAP Databricks you can also follow this tutorial with a native instance of Databricks.You will see in the steps below, how these instructions are created. But to give you the big picture right away: SAP’s Python package hana_ml is used in SAP Databricks to create the SQL syntax that is sent to SAP HANA Cloud to trigger its embedded Predictive Analysis Library.These links might help to find out more about the Predictive Analysis Library. It contains 100+ different algorithms, all implemented by SAP to make best use of SAP HANA Cloud’s architecture.Course: Developing Regression Models with the Python Machine Learning Client for SAP HANADocumentation: Python Machine Learning Client for SAP HANA (hana_ml)Documentation: Predictive Analysis Library  PrerequisitesTo follow this tutorial hands-on you need a SAP Business Data Cloud environment with SAP Datasphere and SAP Databricks. Some familiarity with SAP Databricks would help, for instance having gone through this tutorial, which can be implemented with the free trial of SAP Business Data Cloud. SAP Datasphere Script ServerSAP Datasphere must have the Script Server enabled, so that the Predictive Analysis Library can be used, please see SAP Note 3216010. The Script Server is only available if your system has at least 3 virtual CPUs. Only the very smallest systems are running with 2 vCPUs. If this should apply to your environment, increasing the memory will get you there. SAP Datasphere Database UserThe Python code in SAP Databricks will connect to SAP Datasphere with a Database User. To create this user go in SAP Datasphere to “Space Management”, edit your space and in the “Database Users” section create that user. Give it a name and tick the options in the screenshot. These settings ensure that the user can access the data and write predictions to a table. Click the “info” icon on the right and the credentials show up. You will needThe Database User name (including the name of the space followed by ‘#’)The Host nameThe Port (always 443)And the password (the password is only shown once. If you lose it, you need to request a new one) Data uploadTo keep things focussed, I have created a CSV file with detailed vehicle counts that you can upload as a single file. In this example, we ignore some important facts that you would need to consider in a real project. For instance the number of measurement stations has been changing over time. We will just take the data as it is, so that we can focus on the end-to-end tutorial without getting lost in the nitty gritty details. After all, the vehicle data is just an example. In reality we should have a clean and full history of all parcels.Download the historic data and upload the unzipped file in the Data Builder with “Import CSV file”.  Deploy the table with the suggested name VEHICLECOUNT. And we have data. Each location might have one or more sensors. For each sensor we have hourly measurements.  Currently the Database User cannot yet access the table though. You just need to put a view on top of the table, which has the option “Expose for Consumption” activated. In the little screencam you see that some records have NULL values for the vehicle count. This is a data quality issue from the raw data provided by the city. The sensors might not have been working at that time. We will ignore that in this tutorial. In the real world you may want to look into this as part of the data preparation, maybe by imputing the missing data..  Demand forecast  All code that is described in this tutorial is part of this repository. Just download and import the DBC archive file into SAP Databricks. You now have a project called “Logistics forecast”, which contains two notebooks:Store Database User CredentialsLogistics forecast with HANA ML Store Database User CredentialsContinue by putting your Database User’s credentials in a safe place.  Instead of hardcoding them into each notebook, store them centrally as Databricks secret. Open the notebook “Store Database User Credentials”. Enter your own Database User’s credentials in the code and run the cells. dbuser_address = “yourdbuseraddress”
dbuser_user = “yourdbuser”
dbuser_password = “yourdbuserpassword”

from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
scope = “sapdatasphere_dbuser”
w.secrets.create_scope(scope)
w.secrets.put_secret(scope,”address”,string_value=dbuser_address)
w.secrets.put_secret(scope,”user”,string_value=dbuser_user)
w.secrets.put_secret(scope,”password”,string_value=dbuser_password)Each notebook can now retrieve the credentials from that secret. And to the best of my knowledge, SAP Databricks keeps the values so safe, that you cannot even seem them in clear text yourself. You can try to print them for instance, but it will only show “[REDACTED]”  Connection to SAP DatasphereNow open the notebook “Logistics forecast with HANA ML”. All code that we need for the forecasting is already in that file. In the ‘Environment’-tab on the right, you see that the Python package hana_ml has been added. Clear the Notebook’s output, to be sure that any future output you see, was created by yourself. Run the first cells to retrieve the Database User credentials from the Databricks secret. Use these to establish a connection to SAP Datasphere.dbuser_address = dbutils.secrets.get(scope=”sapdatasphere_dbuser”, key=”address”)
dbuser_user = dbutils.secrets.get(scope=”sapdatasphere_dbuser”, key=”user”)
dbuser_password = dbutils.secrets.get(scope=”sapdatasphere_dbuser”, key=”password”)

import hana_ml.dataframe as dataframe
conn = dataframe.ConnectionContext(address=dbuser_address,
port=443,
user=dbuser_user,
password=dbuser_password,
)
conn.connection.isconnected()The isconnected() function should return “True”.If you are getting an error that the hana_ml package is missing, go back to the ‘Environment’-tab and hit “Apply”. This will install the package.If you are getting a Connection failed error “Socket closed by peer”, then most likely the Notebook’s external IP address hasn’t been added yet to SAP Datasphere’s allow list. You can see the IP address with this command. from requests import get
ip = get(‘https://api.ipify.org’).text
print(f’The public IP address is: {ip}’)Then add this address to the allow list as described in the documentation. Currently (February ’26) Databricks has a feature in Preview mode to get a static IP address. You can reach out to your Databricks account manager to request having this feature activated in your environment. In case you don’t know that person, feel free to ping me and I will try to make a connection. Data ExplorationOnce you are connected, point the hana_ml DataFrame to the Datasphere view you had created earlier. This does not extract any data from SAP Datasphere. The 3 million records remain in place. But you can download rows if you like. Just select 5 rows, to verify the connection does indeed work as expected. Then call a canned report for further data understanding. The details are also calculated within SAP HANA Cloud.current_schema = conn.get_current_schema().split(‘#’)[0]
data_hdf = conn.table(‘V_VEHICLECOUNT’, schema=current_schema)
data_hdf.head(5).collect()from hana_ml.visualizers.unified_report import UnifiedReport
UnifiedReport(data_hdf).build().display() Data PreparationWe would like a daily forecast, but our data is more detailed with hourly timestamps. Hence create a new column that contains only the date, cutting off the hourly information. This column is not persisted, it does not change the underlying view. It is calculating the new column on the fly, when accessing the data.data_hdf = data_hdf.select(‘*’, (‘TO_DATE(MEASUREDATETIME)’, ‘MEASUREDATE’))
data_hdf.head(5).collect() Having the MEASUREDATE column make it easy to summarise the measurements by date.data_hdf = data_hdf.agg([(‘sum’, ‘VEHICLECOUNT’, ‘VEHICLECOUNT’)], group_by=’MEASUREDATE’)
data_hdf = data_hdf.sort_values(‘MEASUREDATE’)
data_hdf.head(5).collect() All aggregations were done in SAP Datasphere / SAP HANA Cloud. Under the hood the aggregation request was translated by hana_ml into the corresponding SQL syntax, which was sent to SAP HANA Cloud. You can see the SELECT statement that was executed.data_hdf.select_statement Plot the daily values. The collect() function downloads the aggregated data as Pandas DataFrame. There seems to be a trend that more vehicles are counted. This could be because of increased traffic, or an increased number of sensors. For this tutorial we won’t look any deeper into this. But a weekly pattern seems to be there. A drop over Christmas. And almost no vehicles were counted for a few days in July 2024. That’s the data quality issue that was mentioned earlier, the sensors might not have been working. We will ignore that too for this tutorial. Time series forecastingSplit the data into training and testing. We want a 7-day forecast. Hence put the most recent 7 days aside, to get a feel for the forecasting accuracy.train_hdf = data_hdf.head(data_hdf.count()-7)
test_hdf = data_hdf.tail(7) Train a time-series model on the training data. The AdditiveModelForecast algorithm is an implementation of Prophet. For simplicity no hyperparameter tuning is done here.from hana_ml.algorithms.pal.tsa.additive_model_forecast import AdditiveModelForecast
amf = AdditiveModelForecast()
amf.fit(data=train_hdf, key=’MEASUREDATE’) Investigate the model in a canned report.from hana_ml.visualizers.unified_report import UnifiedReport
UnifiedReport(amf).build().display() Apply the model on the test data, that was put aside earlier.predicted_hdf = amf.predict(data=test_hdf)
predicted_hdf.head(5).collect() Join the true values from the past with the prediction to assess the forecast accuracy.predicted_hdf = predicted_hdf.set_index(‘MEASUREDATE’).join(test_hdf.set_index(‘MEASUREDATE’))
predicted_hdf.head(5).collect() Compare the actuals and prediction in a plot. The forecast is more accurate, than it might appear at first. Note how the y-axis is not starting at zero, but at 1.2 million. Hence the actuals and predictions are quite close.import pandas as pd
predicted_df = predicted_hdf.collect()
predicted_df[“VEHICLECOUNT”] = pd.to_numeric(predicted_df[“VEHICLECOUNT”])
fig = px.line(predicted_df, x=”MEASUREDATE”, y=[“VEHICLECOUNT”,”YHAT”] , title=’Vehicle Count Forecast Accuracy test on known past’)
fig.show() Calculate the accuracy of the prediction. A Mean Absolute Percentage Error of 3.3% seems quite all right.from hana_ml.algorithms.pal.tsa.accuracy_measure import accuracy_measure
metric_names = [‘mape’, ‘rmse’]
metrics = accuracy_measure(predicted_hdf.select([‘VEHICLECOUNT’, ‘YHAT’]), evaluation_metric=metric_names).collect()
metrics Databricks contains a very comprehensive Repository where such model metrics (or hyperparameters, images, etc…) can be saved. Here we store just the absolute basics. For a much more advanced approach see this blog  “Developing HANA ML models with SAP Databricks” by @nidhi_sawhney.import mlflow
with mlflow.start_run():
for index, single_metric in metrics.iterrows():
mlflow.log_metric(single_metric.STAT_NAME, single_metric.STAT_VALUE) Now that we have a feeling for how well the model worked on the recent past, train a new model on the full history of the data.from hana_ml.algorithms.pal.tsa.additive_model_forecast import AdditiveModelForecast
amf = AdditiveModelForecast()
amf.fit(data=data_hdf, key=’MEASUREDATE’) Obtain list of the future 7 days that we want to predict. The historic data finishes on 18 January 2026.futuredatestopredict_hdf = amf.make_future_dataframe(data_hdf, key=’MEASUREDATE’, periods=7)
futuredatestopredict_hdf.head(10).collect()  Apply the model to predict these future dates.predictedfuture_hdf = amf.predict(data=futuredatestopredict_hdf, key=’MEASUREDATE’)
predictedfuture_hdf.head(10).collect() Turn any potential negative predictions to zero. For this use case negative values would not make sense. Even though this very forecast doesn’t have any negative values, you might need this step in your own projects. predictedfuture_hdf = predictedfuture_hdf.select(‘MEASUREDATE’, (‘GREATEST(YHAT, 0)’, ‘YHAT’),
(‘GREATEST(YHAT_LOWER, 0)’, ‘YHAT_LOWER’),
(‘GREATEST(YHAT_UPPER, 0)’, ‘YHAT_UPPER’))
predictedfuture_hdf.head(10).collect() You can rename the columns, if you prefer, before writing the forecast to a table.predictedfuture_hdf = predictedfuture_hdf.rename_columns([‘MEASUREDATE’, ‘VEHICLECOUNT_PRED’, ‘VEHICLECOUNT_PREDUPPER’, ‘VEHICLECOUNT_PREDLOWER’])
predictedfuture_hdf.head(10).collect() And with that, save the predictions as table to SAP Datasphere / SAP HANA Cloud.predictedfuture_hdf.save(“LOGISTICS_FORECAST”, force=True) Using the forecast in SAP DatasphereThe forecast is created and saved to a table. You can now integrate the information into your modelling and share with your users through SAP Analytics Cloud, or any other way you might prefer. SummaryNow know in detail how SAP Databricks and native Databricks can trigger Machine Learning in SAP Datasphere. It’s an option to consider to avoid data movement. SAP HANA Cloud has a long list of built-in algorithms. If there is one specific algorithm you are missing, just get in touch. Maybe it is there under a different name. If it really isn’t there, get in touch to check with the Product Group whether it can be implemented.Happy predicting!   Read More Technology Blog Posts by SAP articles 

#SAP

#SAPTechnologyblog

You May Also Like

More From Author