SAP Databricks is now GA. Get the most of it skilling yourself with Mosaic AI

Words are cheap. I will be crisp.

SAP is a goldmine of data. The classic old Data Engineering challenge has been unifying that complex, scattered business data – especially from SAP applications like ECC and S/4HANA – with other enterprise data, and then truly leveraging it for advanced analytics, and for some reasons (like MRP or Forecasting), Machine Learning.

The new challenge is to do the same, but for cutting-edge generative AI. To tackle this, SAP introduced early this year the SAP Business Data Cloud (BDC).

BDC has 3 main concepts and 3 main SAP products. The 3 main components are the Data Products, the Insight Apps and the BDC itself. And the products that make it possible are the classics SAP Datasphere, SAP Analytics Cloud (which have been around for a while) and SAP Databricks, which is an optional component of BDC and was in preview… until today.

Now that its GA; better get understand what it is, and what it’s not.

What SAP Databricks IS: The union between SAP Business Data with Generative AI

Do not confuse Databricks with SAP Databricks. Very similar, but different. What SAP Databricks is;

A Native, Fully Managed Service within SAP Business Data Cloud: This isn’t an external solution bolted on. SAP Databricks is a version of the Databricks platform included natively as a service within SAP BDC. It is a fully managed service, meaning SAP handles the underlying infrastructure, allowing us to focus on our data and AI initiatives.Unifying Semantically Rich Business Data with Leading AI/ML: The core power lies in unifying our valuable, semantically rich data from SAP applications with the current industry-leading AI, machine learning, data science, and data engineering platform. SAP achieves the goal to integrate SAP data with the rest of the business data for the reasons why Databricks is booming. Its advanced analytics and AI use cases.Zero-Copy Data Sharing Powered by Delta Sharing: SAP moves away from data replication. BDC is an end destination and SAP Databricks leverages zero-copy Delta Sharing. This enables seamless, bidirectional data sharing between SAP BDC and Databricks, so data does not need to leave the SAP platform to use it for Generative AI use cases. This is crucial for integrating data products from SAP applications with semi-structured and unstructured data from any source, without needing to move the data physically.Custom Data Products: Within SAP Databricks, users can easily access SAP data products and create custom data products out of their AI/ML and analytics workloads. Data products are fundamental building blocks within BDC which I explained in this blog post, serving as packaged and governed data assets.

What Databricks tools are available in SAP Databricks.

Databricks Notebooks: Ideal for pro-code data engineering tasks and building custom AI/ML models.Databricks SQL: Allows us to analyze datasets at scale using standard SQL queries.Unity Catalog: Provides centralized governance for all our data assets, including structured and unstructured data, ML models, notebooks, and files. It enables governance for data products exposed from SAP Databricks via the SAP BDC catalog.Mosaic AI: This is our gateway to building secure, governed, and custom AI/ML solutions, including advanced Generative AI applications and Large Language Models. I’ll discuss this below.Serverless Spark: Provides scalable compute resources for our data processing tasks.

SAP Databricks and Databricks Feature Parity

The following table summarizes the availability status of key Databricks components in Standard Databricks versus SAP Databricks, based on the knowledge I could gather. Subject to be corrected.

 

Feature/Component Category

Feature/Component Name

Standard Databricks

SAP Databricks

Core Platform

Workspace UI

Yes

Yes (Through BDC)

 

REST APIs

Yes

Likely Yes (Potentially subset or different endpoints/authentication)

 

CLI

Yes

Unconfirmed

Compute

All-Purpose Clusters (Classic)

Yes

Likely Excluded (Focus on Serverless)

 

Jobs Compute (Classic)

Yes

Likely Excluded (Focus on Serverless)

 

Instance Pools

Yes

Likely Excluded

 

Serverless SQL Warehouses

Yes

Yes

 

Serverless Notebooks/Jobs Compute

Yes

Yes

Data Management

Delta Lake

Yes

Yes 

 

DBFS (Databricks File System)

Yes

Unconfirmed

 

Volumes (for non-tabular data)

Yes

Unconfirmed

Governance

Unity Catalog

Yes

Yes

 

Data Lineage (via UC)

Yes

Yes

Data Sharing

Delta Sharing

Yes

Yes

Data Engineering

Notebooks

Yes

Yes

 

DLT (Delta Live Tables)

Yes

Yes

 

Auto Loader

Yes

Yes (blog)

 

Jobs / Workflows

Yes

Yes (Likely Serverless execution)

ML/AI (Mosaic AI)

MLflow (Managed)

Yes

Yes

 

Model Registry (via MLflow)

Yes

Yes

 

Model Serving (Mosaic AI)

Yes

Yes

 

Feature Store

Yes

Yes

 

Vector Search (Mosaic AI)

Yes

Yes

 

AutoML

Yes

Yes (Including forecasting for SAP)

 

Agent Framework / Evaluation

Yes

Yes

 

AI Functions

Yes

Yes

 

AI Playground

Yes

Yes

BI & Visualization

SQL Editor

Yes

Yes

 

Databricks SQL Dashboards

Yes

Yes

 

Redash Integration

Yes

Unconfirmed (Likely Excluded)

Ecosystem & Extensibility

Marketplace

Yes

Unconfirmed (Likely Excluded)

 

Partner Connect

Yes

Likely Excluded

 

Databricks Apps

Yes

Likely Excluded

Security

IP Access Lists

Yes

Yes (With caveats regarding BDC connection)

 

PrivateLink / Private Endpoints

Yes

Yes (AWS PrivateLink for us-west-1)

 

Serverless Egress Control

Yes

Yes (With caveats regarding BDC connection)

 

The Generative AI with Mosaic AI

To me, the most innovative piece of what Databricks has to offer is Mosaic AI. I tell you why. Mosaic AI allows us to:

Build Modern Generative AI Solutions: Create cutting-edge GenAI applications directly on the Databricks platform, integrated with our business data.Leverage the Full ML Lifecycle: Build an end-to-end GenAI solution using Databricks’ infrastructure, data, and tools, covering everything from data preparation and management (like building knowledge bases for RAG) to model training, deployment, securing, and monitoring. This allows for greater control and potentially lower costs compared to relying solely on external, black-box GenAI solutions [implied]. Databricks’ AI Agent Framework, powered by Mosaic AI, even allows building domain-specific AI agents that can call external APIs.Ground LLMs with our Business Data: Combine the power of LLMs with any organization’s specific, governed business data to create accurate and relevant AI applications. This is a key pattern, often using Retrieval Augmented Generation (RAG) techniques, which Mosaic AI supports.

I know this is going to bring questions I am yet not ready to answer.

How to get official knowledge on Databricks and Mosaic AI

SAP will be really detailed as usual to share how the Compute, the Data Catalog and the Delta Share works, because SAP Databricks brings a different UI compared to Databricks, it limits the Data Integration to external sources or destinations, and does not allow to deploy the infrastructure on prem or at the hyperscscaler of choice, but the rest is the same, and getting the best of Mosaic AI is on us.

If you have not used it, once you’re comfortable with the basics, its key to understand how Databricks approaches Machine Learning lifecycle and how this is transfered to Generative AI;

Remember, Mosaic was acquired in 2023 and might look different since its specific for Generative AI.

Key Concepts:

MLflow: Understand how Databricks integrates this open-source platform for managing the ML lifecycle (tracking experiments, packaging code, registering and deploying models).Feature Store: How to create, manage, and serve features for model training.Model Training: Using libraries like scikit-learn, TensorFlow, PyTorch within Databricks notebooks.Model Registry: Storing and versioning trained models (often within Unity Catalog).Basic Model Serving: Understanding the concept of deploying models as APIs (aka Mosaic AI Model Serving).

Databricks AI/ML Documentation: Tutorials: Get started with AI and machine learning (AWS Example)

 

 

​ Words are cheap. I will be crisp.SAP is a goldmine of data. The classic old Data Engineering challenge has been unifying that complex, scattered business data – especially from SAP applications like ECC and S/4HANA – with other enterprise data, and then truly leveraging it for advanced analytics, and for some reasons (like MRP or Forecasting), Machine Learning.The new challenge is to do the same, but for cutting-edge generative AI. To tackle this, SAP introduced early this year the SAP Business Data Cloud (BDC).BDC has 3 main concepts and 3 main SAP products. The 3 main components are the Data Products, the Insight Apps and the BDC itself. And the products that make it possible are the classics SAP Datasphere, SAP Analytics Cloud (which have been around for a while) and SAP Databricks, which is an optional component of BDC and was in preview… until today.Now that its GA; better get understand what it is, and what it’s not.What SAP Databricks IS: The union between SAP Business Data with Generative AIDo not confuse Databricks with SAP Databricks. Very similar, but different. What SAP Databricks is;A Native, Fully Managed Service within SAP Business Data Cloud: This isn’t an external solution bolted on. SAP Databricks is a version of the Databricks platform included natively as a service within SAP BDC. It is a fully managed service, meaning SAP handles the underlying infrastructure, allowing us to focus on our data and AI initiatives.Unifying Semantically Rich Business Data with Leading AI/ML: The core power lies in unifying our valuable, semantically rich data from SAP applications with the current industry-leading AI, machine learning, data science, and data engineering platform. SAP achieves the goal to integrate SAP data with the rest of the business data for the reasons why Databricks is booming. Its advanced analytics and AI use cases.Zero-Copy Data Sharing Powered by Delta Sharing: SAP moves away from data replication. BDC is an end destination and SAP Databricks leverages zero-copy Delta Sharing. This enables seamless, bidirectional data sharing between SAP BDC and Databricks, so data does not need to leave the SAP platform to use it for Generative AI use cases. This is crucial for integrating data products from SAP applications with semi-structured and unstructured data from any source, without needing to move the data physically.Custom Data Products: Within SAP Databricks, users can easily access SAP data products and create custom data products out of their AI/ML and analytics workloads. Data products are fundamental building blocks within BDC which I explained in this blog post, serving as packaged and governed data assets.What Databricks tools are available in SAP Databricks.Databricks Notebooks: Ideal for pro-code data engineering tasks and building custom AI/ML models.Databricks SQL: Allows us to analyze datasets at scale using standard SQL queries.Unity Catalog: Provides centralized governance for all our data assets, including structured and unstructured data, ML models, notebooks, and files. It enables governance for data products exposed from SAP Databricks via the SAP BDC catalog.Mosaic AI: This is our gateway to building secure, governed, and custom AI/ML solutions, including advanced Generative AI applications and Large Language Models. I’ll discuss this below.Serverless Spark: Provides scalable compute resources for our data processing tasks.SAP Databricks and Databricks Feature ParityThe following table summarizes the availability status of key Databricks components in Standard Databricks versus SAP Databricks, based on the knowledge I could gather. Subject to be corrected. Feature/Component CategoryFeature/Component NameStandard DatabricksSAP DatabricksCore PlatformWorkspace UIYesYes (Through BDC) REST APIsYesLikely Yes (Potentially subset or different endpoints/authentication) CLIYesUnconfirmedComputeAll-Purpose Clusters (Classic)YesLikely Excluded (Focus on Serverless) Jobs Compute (Classic)YesLikely Excluded (Focus on Serverless) Instance PoolsYesLikely Excluded Serverless SQL WarehousesYesYes Serverless Notebooks/Jobs ComputeYesYesData ManagementDelta LakeYesYes  DBFS (Databricks File System)YesUnconfirmed Volumes (for non-tabular data)YesUnconfirmedGovernanceUnity CatalogYesYes Data Lineage (via UC)YesYesData SharingDelta SharingYesYesData EngineeringNotebooksYesYes DLT (Delta Live Tables)YesYes Auto LoaderYesYes (blog) Jobs / WorkflowsYesYes (Likely Serverless execution)ML/AI (Mosaic AI)MLflow (Managed)YesYes Model Registry (via MLflow)YesYes Model Serving (Mosaic AI)YesYes Feature StoreYesYes Vector Search (Mosaic AI)YesYes AutoMLYesYes (Including forecasting for SAP) Agent Framework / EvaluationYesYes AI FunctionsYesYes AI PlaygroundYesYesBI & VisualizationSQL EditorYesYes Databricks SQL DashboardsYesYes Redash IntegrationYesUnconfirmed (Likely Excluded)Ecosystem & ExtensibilityMarketplaceYesUnconfirmed (Likely Excluded) Partner ConnectYesLikely Excluded Databricks AppsYesLikely ExcludedSecurityIP Access ListsYesYes (With caveats regarding BDC connection) PrivateLink / Private EndpointsYesYes (AWS PrivateLink for us-west-1) Serverless Egress ControlYesYes (With caveats regarding BDC connection) The Generative AI with Mosaic AITo me, the most innovative piece of what Databricks has to offer is Mosaic AI. I tell you why. Mosaic AI allows us to:Build Modern Generative AI Solutions: Create cutting-edge GenAI applications directly on the Databricks platform, integrated with our business data.Leverage the Full ML Lifecycle: Build an end-to-end GenAI solution using Databricks’ infrastructure, data, and tools, covering everything from data preparation and management (like building knowledge bases for RAG) to model training, deployment, securing, and monitoring. This allows for greater control and potentially lower costs compared to relying solely on external, black-box GenAI solutions [implied]. Databricks’ AI Agent Framework, powered by Mosaic AI, even allows building domain-specific AI agents that can call external APIs.Ground LLMs with our Business Data: Combine the power of LLMs with any organization’s specific, governed business data to create accurate and relevant AI applications. This is a key pattern, often using Retrieval Augmented Generation (RAG) techniques, which Mosaic AI supports.I know this is going to bring questions I am yet not ready to answer.How to get official knowledge on Databricks and Mosaic AISAP will be really detailed as usual to share how the Compute, the Data Catalog and the Delta Share works, because SAP Databricks brings a different UI compared to Databricks, it limits the Data Integration to external sources or destinations, and does not allow to deploy the infrastructure on prem or at the hyperscscaler of choice, but the rest is the same, and getting the best of Mosaic AI is on us.If you have not used it, once you’re comfortable with the basics, its key to understand how Databricks approaches Machine Learning lifecycle and how this is transfered to Generative AI;Remember, Mosaic was acquired in 2023 and might look different since its specific for Generative AI.Key Concepts:MLflow: Understand how Databricks integrates this open-source platform for managing the ML lifecycle (tracking experiments, packaging code, registering and deploying models).Feature Store: How to create, manage, and serve features for model training.Model Training: Using libraries like scikit-learn, TensorFlow, PyTorch within Databricks notebooks.Model Registry: Storing and versioning trained models (often within Unity Catalog).Basic Model Serving: Understanding the concept of deploying models as APIs (aka Mosaic AI Model Serving).Databricks AI/ML Documentation: Tutorials: Get started with AI and machine learning (AWS Example)    Read More Technology Blogs by Members articles 

#SAP

#SAPTechnologyblog

You May Also Like

More From Author