Words are cheap. I will be crisp.
SAP is a goldmine of data. The classic old Data Engineering challenge has been unifying that complex, scattered business data – especially from SAP applications like ECC and S/4HANA – with other enterprise data, and then truly leveraging it for advanced analytics, and for some reasons (like MRP or Forecasting), Machine Learning.
The new challenge is to do the same, but for cutting-edge generative AI. To tackle this, SAP introduced early this year the SAP Business Data Cloud (BDC).
BDC has 3 main concepts and 3 main SAP products. The 3 main components are the Data Products, the Insight Apps and the BDC itself. And the products that make it possible are the classics SAP Datasphere, SAP Analytics Cloud (which have been around for a while) and SAP Databricks, which is an optional component of BDC and was in preview… until today.
Now that its GA; better get understand what it is, and what it’s not.
What SAP Databricks IS: The union between SAP Business Data with Generative AI
Do not confuse Databricks with SAP Databricks. Very similar, but different. What SAP Databricks is;
A Native, Fully Managed Service within SAP Business Data Cloud: This isn’t an external solution bolted on. SAP Databricks is a version of the Databricks platform included natively as a service within SAP BDC. It is a fully managed service, meaning SAP handles the underlying infrastructure, allowing us to focus on our data and AI initiatives.Unifying Semantically Rich Business Data with Leading AI/ML: The core power lies in unifying our valuable, semantically rich data from SAP applications with the current industry-leading AI, machine learning, data science, and data engineering platform. SAP achieves the goal to integrate SAP data with the rest of the business data for the reasons why Databricks is booming. Its advanced analytics and AI use cases.Zero-Copy Data Sharing Powered by Delta Sharing: SAP moves away from data replication. BDC is an end destination and SAP Databricks leverages zero-copy Delta Sharing. This enables seamless, bidirectional data sharing between SAP BDC and Databricks, so data does not need to leave the SAP platform to use it for Generative AI use cases. This is crucial for integrating data products from SAP applications with semi-structured and unstructured data from any source, without needing to move the data physically.Custom Data Products: Within SAP Databricks, users can easily access SAP data products and create custom data products out of their AI/ML and analytics workloads. Data products are fundamental building blocks within BDC which I explained in this blog post, serving as packaged and governed data assets.
What Databricks tools are available in SAP Databricks.
Databricks Notebooks: Ideal for pro-code data engineering tasks and building custom AI/ML models.Databricks SQL: Allows us to analyze datasets at scale using standard SQL queries.Unity Catalog: Provides centralized governance for all our data assets, including structured and unstructured data, ML models, notebooks, and files. It enables governance for data products exposed from SAP Databricks via the SAP BDC catalog.Mosaic AI: This is our gateway to building secure, governed, and custom AI/ML solutions, including advanced Generative AI applications and Large Language Models. I’ll discuss this below.Serverless Spark: Provides scalable compute resources for our data processing tasks.
SAP Databricks and Databricks Feature Parity
The following table summarizes the availability status of key Databricks components in Standard Databricks versus SAP Databricks, based on the knowledge I could gather. Subject to be corrected.
Feature/Component Category
Feature/Component Name
Standard Databricks
SAP Databricks
Core Platform
Workspace UI
Yes
Yes (Through BDC)
REST APIs
Yes
Likely Yes (Potentially subset or different endpoints/authentication)
CLI
Yes
Unconfirmed
Compute
All-Purpose Clusters (Classic)
Yes
Likely Excluded (Focus on Serverless)
Jobs Compute (Classic)
Yes
Likely Excluded (Focus on Serverless)
Instance Pools
Yes
Likely Excluded
Serverless SQL Warehouses
Yes
Yes
Serverless Notebooks/Jobs Compute
Yes
Yes
Data Management
Delta Lake
Yes
Yes
DBFS (Databricks File System)
Yes
Unconfirmed
Volumes (for non-tabular data)
Yes
Unconfirmed
Governance
Unity Catalog
Yes
Yes
Data Lineage (via UC)
Yes
Yes
Data Sharing
Delta Sharing
Yes
Yes
Data Engineering
Notebooks
Yes
Yes
DLT (Delta Live Tables)
Yes
Yes
Auto Loader
Yes
Yes (blog)
Jobs / Workflows
Yes
Yes (Likely Serverless execution)
ML/AI (Mosaic AI)
MLflow (Managed)
Yes
Yes
Model Registry (via MLflow)
Yes
Yes
Model Serving (Mosaic AI)
Yes
Yes
Feature Store
Yes
Yes
Vector Search (Mosaic AI)
Yes
Yes
AutoML
Yes
Yes (Including forecasting for SAP)
Agent Framework / Evaluation
Yes
Yes
AI Functions
Yes
Yes
AI Playground
Yes
Yes
BI & Visualization
SQL Editor
Yes
Yes
Databricks SQL Dashboards
Yes
Yes
Redash Integration
Yes
Unconfirmed (Likely Excluded)
Ecosystem & Extensibility
Marketplace
Yes
Unconfirmed (Likely Excluded)
Partner Connect
Yes
Likely Excluded
Databricks Apps
Yes
Likely Excluded
Security
IP Access Lists
Yes
Yes (With caveats regarding BDC connection)
PrivateLink / Private Endpoints
Yes
Yes (AWS PrivateLink for us-west-1)
Serverless Egress Control
Yes
Yes (With caveats regarding BDC connection)
The Generative AI with Mosaic AI
To me, the most innovative piece of what Databricks has to offer is Mosaic AI. I tell you why. Mosaic AI allows us to:
Build Modern Generative AI Solutions: Create cutting-edge GenAI applications directly on the Databricks platform, integrated with our business data.Leverage the Full ML Lifecycle: Build an end-to-end GenAI solution using Databricks’ infrastructure, data, and tools, covering everything from data preparation and management (like building knowledge bases for RAG) to model training, deployment, securing, and monitoring. This allows for greater control and potentially lower costs compared to relying solely on external, black-box GenAI solutions [implied]. Databricks’ AI Agent Framework, powered by Mosaic AI, even allows building domain-specific AI agents that can call external APIs.Ground LLMs with our Business Data: Combine the power of LLMs with any organization’s specific, governed business data to create accurate and relevant AI applications. This is a key pattern, often using Retrieval Augmented Generation (RAG) techniques, which Mosaic AI supports.
I know this is going to bring questions I am yet not ready to answer.
How to get official knowledge on Databricks and Mosaic AI
SAP will be really detailed as usual to share how the Compute, the Data Catalog and the Delta Share works, because SAP Databricks brings a different UI compared to Databricks, it limits the Data Integration to external sources or destinations, and does not allow to deploy the infrastructure on prem or at the hyperscscaler of choice, but the rest is the same, and getting the best of Mosaic AI is on us.
If you have not used it, once you’re comfortable with the basics, its key to understand how Databricks approaches Machine Learning lifecycle and how this is transfered to Generative AI;
Remember, Mosaic was acquired in 2023 and might look different since its specific for Generative AI.
Key Concepts:
MLflow: Understand how Databricks integrates this open-source platform for managing the ML lifecycle (tracking experiments, packaging code, registering and deploying models).Feature Store: How to create, manage, and serve features for model training.Model Training: Using libraries like scikit-learn, TensorFlow, PyTorch within Databricks notebooks.Model Registry: Storing and versioning trained models (often within Unity Catalog).Basic Model Serving: Understanding the concept of deploying models as APIs (aka Mosaic AI Model Serving).
Databricks AI/ML Documentation: Tutorials: Get started with AI and machine learning (AWS Example)
Words are cheap. I will be crisp.SAP is a goldmine of data. The classic old Data Engineering challenge has been unifying that complex, scattered business data – especially from SAP applications like ECC and S/4HANA – with other enterprise data, and then truly leveraging it for advanced analytics, and for some reasons (like MRP or Forecasting), Machine Learning.The new challenge is to do the same, but for cutting-edge generative AI. To tackle this, SAP introduced early this year the SAP Business Data Cloud (BDC).BDC has 3 main concepts and 3 main SAP products. The 3 main components are the Data Products, the Insight Apps and the BDC itself. And the products that make it possible are the classics SAP Datasphere, SAP Analytics Cloud (which have been around for a while) and SAP Databricks, which is an optional component of BDC and was in preview… until today.Now that its GA; better get understand what it is, and what it’s not.What SAP Databricks IS: The union between SAP Business Data with Generative AIDo not confuse Databricks with SAP Databricks. Very similar, but different. What SAP Databricks is;A Native, Fully Managed Service within SAP Business Data Cloud: This isn’t an external solution bolted on. SAP Databricks is a version of the Databricks platform included natively as a service within SAP BDC. It is a fully managed service, meaning SAP handles the underlying infrastructure, allowing us to focus on our data and AI initiatives.Unifying Semantically Rich Business Data with Leading AI/ML: The core power lies in unifying our valuable, semantically rich data from SAP applications with the current industry-leading AI, machine learning, data science, and data engineering platform. SAP achieves the goal to integrate SAP data with the rest of the business data for the reasons why Databricks is booming. Its advanced analytics and AI use cases.Zero-Copy Data Sharing Powered by Delta Sharing: SAP moves away from data replication. BDC is an end destination and SAP Databricks leverages zero-copy Delta Sharing. This enables seamless, bidirectional data sharing between SAP BDC and Databricks, so data does not need to leave the SAP platform to use it for Generative AI use cases. This is crucial for integrating data products from SAP applications with semi-structured and unstructured data from any source, without needing to move the data physically.Custom Data Products: Within SAP Databricks, users can easily access SAP data products and create custom data products out of their AI/ML and analytics workloads. Data products are fundamental building blocks within BDC which I explained in this blog post, serving as packaged and governed data assets.What Databricks tools are available in SAP Databricks.Databricks Notebooks: Ideal for pro-code data engineering tasks and building custom AI/ML models.Databricks SQL: Allows us to analyze datasets at scale using standard SQL queries.Unity Catalog: Provides centralized governance for all our data assets, including structured and unstructured data, ML models, notebooks, and files. It enables governance for data products exposed from SAP Databricks via the SAP BDC catalog.Mosaic AI: This is our gateway to building secure, governed, and custom AI/ML solutions, including advanced Generative AI applications and Large Language Models. I’ll discuss this below.Serverless Spark: Provides scalable compute resources for our data processing tasks.SAP Databricks and Databricks Feature ParityThe following table summarizes the availability status of key Databricks components in Standard Databricks versus SAP Databricks, based on the knowledge I could gather. Subject to be corrected. Feature/Component CategoryFeature/Component NameStandard DatabricksSAP DatabricksCore PlatformWorkspace UIYesYes (Through BDC) REST APIsYesLikely Yes (Potentially subset or different endpoints/authentication) CLIYesUnconfirmedComputeAll-Purpose Clusters (Classic)YesLikely Excluded (Focus on Serverless) Jobs Compute (Classic)YesLikely Excluded (Focus on Serverless) Instance PoolsYesLikely Excluded Serverless SQL WarehousesYesYes Serverless Notebooks/Jobs ComputeYesYesData ManagementDelta LakeYesYes DBFS (Databricks File System)YesUnconfirmed Volumes (for non-tabular data)YesUnconfirmedGovernanceUnity CatalogYesYes Data Lineage (via UC)YesYesData SharingDelta SharingYesYesData EngineeringNotebooksYesYes DLT (Delta Live Tables)YesYes Auto LoaderYesYes (blog) Jobs / WorkflowsYesYes (Likely Serverless execution)ML/AI (Mosaic AI)MLflow (Managed)YesYes Model Registry (via MLflow)YesYes Model Serving (Mosaic AI)YesYes Feature StoreYesYes Vector Search (Mosaic AI)YesYes AutoMLYesYes (Including forecasting for SAP) Agent Framework / EvaluationYesYes AI FunctionsYesYes AI PlaygroundYesYesBI & VisualizationSQL EditorYesYes Databricks SQL DashboardsYesYes Redash IntegrationYesUnconfirmed (Likely Excluded)Ecosystem & ExtensibilityMarketplaceYesUnconfirmed (Likely Excluded) Partner ConnectYesLikely Excluded Databricks AppsYesLikely ExcludedSecurityIP Access ListsYesYes (With caveats regarding BDC connection) PrivateLink / Private EndpointsYesYes (AWS PrivateLink for us-west-1) Serverless Egress ControlYesYes (With caveats regarding BDC connection) The Generative AI with Mosaic AITo me, the most innovative piece of what Databricks has to offer is Mosaic AI. I tell you why. Mosaic AI allows us to:Build Modern Generative AI Solutions: Create cutting-edge GenAI applications directly on the Databricks platform, integrated with our business data.Leverage the Full ML Lifecycle: Build an end-to-end GenAI solution using Databricks’ infrastructure, data, and tools, covering everything from data preparation and management (like building knowledge bases for RAG) to model training, deployment, securing, and monitoring. This allows for greater control and potentially lower costs compared to relying solely on external, black-box GenAI solutions [implied]. Databricks’ AI Agent Framework, powered by Mosaic AI, even allows building domain-specific AI agents that can call external APIs.Ground LLMs with our Business Data: Combine the power of LLMs with any organization’s specific, governed business data to create accurate and relevant AI applications. This is a key pattern, often using Retrieval Augmented Generation (RAG) techniques, which Mosaic AI supports.I know this is going to bring questions I am yet not ready to answer.How to get official knowledge on Databricks and Mosaic AISAP will be really detailed as usual to share how the Compute, the Data Catalog and the Delta Share works, because SAP Databricks brings a different UI compared to Databricks, it limits the Data Integration to external sources or destinations, and does not allow to deploy the infrastructure on prem or at the hyperscscaler of choice, but the rest is the same, and getting the best of Mosaic AI is on us.If you have not used it, once you’re comfortable with the basics, its key to understand how Databricks approaches Machine Learning lifecycle and how this is transfered to Generative AI;Remember, Mosaic was acquired in 2023 and might look different since its specific for Generative AI.Key Concepts:MLflow: Understand how Databricks integrates this open-source platform for managing the ML lifecycle (tracking experiments, packaging code, registering and deploying models).Feature Store: How to create, manage, and serve features for model training.Model Training: Using libraries like scikit-learn, TensorFlow, PyTorch within Databricks notebooks.Model Registry: Storing and versioning trained models (often within Unity Catalog).Basic Model Serving: Understanding the concept of deploying models as APIs (aka Mosaic AI Model Serving).Databricks AI/ML Documentation: Tutorials: Get started with AI and machine learning (AWS Example) Read More Technology Blogs by Members articles
#SAP
#SAPTechnologyblog