How to setup SAP CDC Solution in Azure Data Factory

Estimated read time 15 min read

Introduction

Azure Data Factory (ADF) is a data integration (ETL/ELT) Platform as a Service (PaaS) and for SAP data integration, ADF currently offers six connectors:

These connectors can only extract data in batches, where each batch treats old and new data equally w/o identifying data changes (“batch mode”).  This extraction mode isn’t optimal when dealing w/ large data sets, such as tables w/ millions/billions of records, that change often 

Consequently, customers have been asking for a new connector that can extract only data changes (inserts/updates/deletes = “deltas”), leveraging Change Data Capture (CDC) feature that exists in most SAP systems (“CDC mode”).  After gathering their requirements, New SAP CDC connector leveraging SAP Operational Data Provisioning (ODP) framework.  This new connector can connect to all SAP systems that support ODP, such as R/3, ECC, S/4HANA, BW, and BW/4HANA, directly at the application layer or indirectly via SAP Landscape Transformation (SLT) replication server as a proxy.  It can fully/incrementally extract SAP data that includes not only physical tables, but also logical objects created on top of those tables, such as ABAP) Core Data Services (CDS) views,

Architecture

The high-level architecture of our SAP CDC solution in ADF is divided into two sides, left-hand-side (LHS) and right-hand-side (RHS).  LHS includes SAP CDC connector that invokes ODP API over standard Remote Function Call (RFC) modules to extract raw SAP data (full + deltas).  RHS includes ADF copy activity that loads the raw SAP data into any destination, such as Azure Blob Storage/Azure Data Lake Store (ADLS) Gen2, in CSV/Parquet format, essentially archiving/preserving all historical changes.  RHS can also include ADF data flow activity that transforms the raw SAP data, merges all changes, and loads the result into any destination, such as Azure SQL Database/Azure Synapse Analytics, essentially replicating SAP data.  ADF data flow activity can also load the result into ADLS Gen2 in Delta format, enabling time-travel to produce snapshots of SAP data at any given periods in the past.  LHS and RHS can be combined as SAP CDC/replication template to auto-generate ADF pipeline that can be frequently run using ADF tumbling window trigger to replicate SAP data into Azure w/ low latency and w/o watermarking.

 

ADF copy activity w/ SAP CDC connector runs on a self-hosted integration runtime (SHIR) that you install on your on-premises/virtual machine, so it has a line of sight to your SAP source systems/SLT replication server, while ADF data flow activity runs on a serverless Databricks/Spark cluster, Azure IR.  SAP CDC connector via ODP can extract various data source (“provider”) types, such as:

 

SAP extractors, originally built to extract data from SAP ECC and load it into SAP BWABAP CDS views, the new data extraction standard for SAP S/4HANAInfo Providers and Info Objects in SAP BW and BW/4HANASAP application tables, when using SLT replication server as a proxy

 

These providers run on SAP systems to produce full/incremental data in Operational Delta Queue (ODQ) that is consumed by ADF copy activity w/ SAP CDC connector in ADF pipeline (“subscriber”).

INITIAL SLT CONFIGURATION

SLT is a database trigger-enabled CDC solution that can replicate SAP application tables and simple views in near real time. SLT replicates from SAP source systems to various targets, including the operational delta queue (ODQ). In S/4HANA 2020 and higher, the addon is already built into the S4CORE component.

During the replication process, database triggers track all changes to the data stored in the source tables. Every operation is registered in the logging table, and the function module transfers data to a specified target. SLT automatically creates all required objects when you initiate the data extraction. Keeping changes in the logging tables provides a level of fault tolerance that prevents data loss when the system that manages the replication is temporarily unavailable. In such a case, once the replication process is re-established, the SLT can easily identify all not-replicated changes and continue operation.

To start the replication process you need to set-up SAP SLT. You do it by creating configurations that controls the replication process. You manage the configuration in the SLT Cockpit (t-code: LTRC). In the Standalone deployment you should start the cockpit on the Replication Server, and not on the source.

The initial screen of the SLT Cockpit lists all previously created configurations. Each configuration has a unique identifier called Mass Transfer ID (MTID). Click the Create Configuration button to define a new one. It opens a wizard that guides you through basic settings.

The SAP SLT framework is part of the DMIS (Data Migration Server) component. In standalone deployment, you must install it on both systems: the source (transactional) system, as well as the one

The number of Data Transfer / Initial Load / Calculation Jobs should come from the sizing you perform. For small configuration, with limited number of tables in scope SAP recommends using not less than two data load jobs. There is 1-to-1 correlation between specified data transfer jobs and the dialog work processes in the source system.

The number of Initial Load Jobs should be lower than total number of data transfer jobs, otherwise after the initial load the system won’t replicate changes. Calculation Jobs run the initial assessment of the data stored in the source table and chunk data into smaller pieces.

Validation

To validate your SAP system configurations for ODP, you can run RODPS_REPL_TEST program to test the extraction of your SAPI extractors, CDS views, BW objects, etc.

Prepare SHIR (self-hosted integration runtime) w/ SAP CDC connector

To prepare SHIR w/ SAP CDC connector, complete the following steps:

On ADF Studio, create and configure SHIR, We can download our latest private SHIR version w/ improved performance and detailed error messages from https://adfsapfileshare.blob.core.windows.net/shir/IntegrationRuntime_5.18.8172.1.msi and install it on your on-premises/virtual machine.The more CPU cores you have on your SHIR machine, the higher your data extraction throughput.To avoid being replaced by later versions, on ADF Studio, go to the Integration runtimes section of Manage hub, select your SHIR to edit, and select the Auto update tab to disable this feature.

Download the latest 64-bit SAP .NET Connector (SAP NCo 3.0) from https://support.sap.com/en/product/connectors/msnet.html and install it on your SHIR machine.  During installation, select the Install Assemblies to GAC option in the Optional setup steps window. 

 

Add a network security rule on your SAP systems, so SHIR machine can connect to them.  If your SAP system is on Azure virtual machine (VM), add the rule by setting the Source IP addresses/CIDR ranges property to your SHIR machine IP address and the Destination port ranges property to 3200,3300.  For example:

 

On SHIR machine, run the following PowerShell cmdlet to ensure that it can connect to your SAP systems: Test-NetConnection <SAP system IP address> -port 3300 

 

Prepare SAP CDC linked service

To prepare SAP CDC linked service, complete the following steps:

On ADF Studio, go to the Linked services section of Manage hub and select the New button to create a new linked service.

 

Search for SAP and select SAP CDC (Preview).

 

 

Set SAP CDC linked service properties, many of them are similar to SAP Table linked service propertiesFor the Connect via integration runtime property, select your SHIR.For the Server name property, enter the mapped server name for your SAP system.For the Subscriber name property, enter a unique name to register and identify this ADF connection as a subscriber that consumes data packages produced in ODQ by your SAP system. 

 

Test the connection and create your new SAP CDC linked service.

Monitor data extractions on SAP systems

To monitor data extractions on SAP systems, complete the following steps:

 

Using SAP Logon Tool on your SAP source system, run ODQMON transaction code.

 

 

 

Enter the value for Subcriber name property of your SAP CDC linked service in the Subscriber input field and select All in the Request Selection dropdown menu to show all data extractions using that linked service.

 

You can now see all registered subscriber processes in ODQ representing data extractions from ADF copy activities that use your SAP CDC linked service.  On each ODQ subscription, you can drill down to see individual full/delta extractions.  On each extraction, you can drill down to see individual data packages that were consumed.When ADF copy activities that extract SAP data are no longer needed, their ODQ subscriptions should be deleted, so SAP systems can stop tracking their subscription states and remove the unconsumed data packages from ODQ.  To do so, select the unneeded ODQ subscriptions and delete them.

 

Conclusion:

The SAP CDC connector in Data Factory reads delta changes from the SAP ODP framework. The deltas are recorded in ODQ tables.

Reference:

Overview and architecture of the SAP CDC capabilities – Azure Data Factory | Microsoft Learn

 

 

 

 

 

 

 

​ IntroductionAzure Data Factory (ADF) is a data integration (ETL/ELT) Platform as a Service (PaaS) and for SAP data integration, ADF currently offers six connectors:These connectors can only extract data in batches, where each batch treats old and new data equally w/o identifying data changes (“batch mode”).  This extraction mode isn’t optimal when dealing w/ large data sets, such as tables w/ millions/billions of records, that change often Consequently, customers have been asking for a new connector that can extract only data changes (inserts/updates/deletes = “deltas”), leveraging Change Data Capture (CDC) feature that exists in most SAP systems (“CDC mode”).  After gathering their requirements, New SAP CDC connector leveraging SAP Operational Data Provisioning (ODP) framework.  This new connector can connect to all SAP systems that support ODP, such as R/3, ECC, S/4HANA, BW, and BW/4HANA, directly at the application layer or indirectly via SAP Landscape Transformation (SLT) replication server as a proxy.  It can fully/incrementally extract SAP data that includes not only physical tables, but also logical objects created on top of those tables, such as ABAP) Core Data Services (CDS) views,ArchitectureThe high-level architecture of our SAP CDC solution in ADF is divided into two sides, left-hand-side (LHS) and right-hand-side (RHS).  LHS includes SAP CDC connector that invokes ODP API over standard Remote Function Call (RFC) modules to extract raw SAP data (full + deltas).  RHS includes ADF copy activity that loads the raw SAP data into any destination, such as Azure Blob Storage/Azure Data Lake Store (ADLS) Gen2, in CSV/Parquet format, essentially archiving/preserving all historical changes.  RHS can also include ADF data flow activity that transforms the raw SAP data, merges all changes, and loads the result into any destination, such as Azure SQL Database/Azure Synapse Analytics, essentially replicating SAP data.  ADF data flow activity can also load the result into ADLS Gen2 in Delta format, enabling time-travel to produce snapshots of SAP data at any given periods in the past.  LHS and RHS can be combined as SAP CDC/replication template to auto-generate ADF pipeline that can be frequently run using ADF tumbling window trigger to replicate SAP data into Azure w/ low latency and w/o watermarking. ADF copy activity w/ SAP CDC connector runs on a self-hosted integration runtime (SHIR) that you install on your on-premises/virtual machine, so it has a line of sight to your SAP source systems/SLT replication server, while ADF data flow activity runs on a serverless Databricks/Spark cluster, Azure IR.  SAP CDC connector via ODP can extract various data source (“provider”) types, such as: SAP extractors, originally built to extract data from SAP ECC and load it into SAP BWABAP CDS views, the new data extraction standard for SAP S/4HANAInfo Providers and Info Objects in SAP BW and BW/4HANASAP application tables, when using SLT replication server as a proxy These providers run on SAP systems to produce full/incremental data in Operational Delta Queue (ODQ) that is consumed by ADF copy activity w/ SAP CDC connector in ADF pipeline (“subscriber”).INITIAL SLT CONFIGURATIONSLT is a database trigger-enabled CDC solution that can replicate SAP application tables and simple views in near real time. SLT replicates from SAP source systems to various targets, including the operational delta queue (ODQ). In S/4HANA 2020 and higher, the addon is already built into the S4CORE component.During the replication process, database triggers track all changes to the data stored in the source tables. Every operation is registered in the logging table, and the function module transfers data to a specified target. SLT automatically creates all required objects when you initiate the data extraction. Keeping changes in the logging tables provides a level of fault tolerance that prevents data loss when the system that manages the replication is temporarily unavailable. In such a case, once the replication process is re-established, the SLT can easily identify all not-replicated changes and continue operation.To start the replication process you need to set-up SAP SLT. You do it by creating configurations that controls the replication process. You manage the configuration in the SLT Cockpit (t-code: LTRC). In the Standalone deployment you should start the cockpit on the Replication Server, and not on the source.The initial screen of the SLT Cockpit lists all previously created configurations. Each configuration has a unique identifier called Mass Transfer ID (MTID). Click the Create Configuration button to define a new one. It opens a wizard that guides you through basic settings.The SAP SLT framework is part of the DMIS (Data Migration Server) component. In standalone deployment, you must install it on both systems: the source (transactional) system, as well as the oneThe number of Data Transfer / Initial Load / Calculation Jobs should come from the sizing you perform. For small configuration, with limited number of tables in scope SAP recommends using not less than two data load jobs. There is 1-to-1 correlation between specified data transfer jobs and the dialog work processes in the source system.The number of Initial Load Jobs should be lower than total number of data transfer jobs, otherwise after the initial load the system won’t replicate changes. Calculation Jobs run the initial assessment of the data stored in the source table and chunk data into smaller pieces.ValidationTo validate your SAP system configurations for ODP, you can run RODPS_REPL_TEST program to test the extraction of your SAPI extractors, CDS views, BW objects, etc.Prepare SHIR (self-hosted integration runtime) w/ SAP CDC connectorTo prepare SHIR w/ SAP CDC connector, complete the following steps: On ADF Studio, create and configure SHIR, We can download our latest private SHIR version w/ improved performance and detailed error messages from https://adfsapfileshare.blob.core.windows.net/shir/IntegrationRuntime_5.18.8172.1.msi and install it on your on-premises/virtual machine.The more CPU cores you have on your SHIR machine, the higher your data extraction throughput.To avoid being replaced by later versions, on ADF Studio, go to the Integration runtimes section of Manage hub, select your SHIR to edit, and select the Auto update tab to disable this feature.Download the latest 64-bit SAP .NET Connector (SAP NCo 3.0) from https://support.sap.com/en/product/connectors/msnet.html and install it on your SHIR machine.  During installation, select the Install Assemblies to GAC option in the Optional setup steps window.  Add a network security rule on your SAP systems, so SHIR machine can connect to them.  If your SAP system is on Azure virtual machine (VM), add the rule by setting the Source IP addresses/CIDR ranges property to your SHIR machine IP address and the Destination port ranges property to 3200,3300.  For example: On SHIR machine, run the following PowerShell cmdlet to ensure that it can connect to your SAP systems: Test-NetConnection <SAP system IP address> -port 3300  Prepare SAP CDC linked serviceTo prepare SAP CDC linked service, complete the following steps:On ADF Studio, go to the Linked services section of Manage hub and select the New button to create a new linked service. Search for SAP and select SAP CDC (Preview).  Set SAP CDC linked service properties, many of them are similar to SAP Table linked service propertiesFor the Connect via integration runtime property, select your SHIR.For the Server name property, enter the mapped server name for your SAP system.For the Subscriber name property, enter a unique name to register and identify this ADF connection as a subscriber that consumes data packages produced in ODQ by your SAP system.  Test the connection and create your new SAP CDC linked service.Monitor data extractions on SAP systemsTo monitor data extractions on SAP systems, complete the following steps: Using SAP Logon Tool on your SAP source system, run ODQMON transaction code.   Enter the value for Subcriber name property of your SAP CDC linked service in the Subscriber input field and select All in the Request Selection dropdown menu to show all data extractions using that linked service. You can now see all registered subscriber processes in ODQ representing data extractions from ADF copy activities that use your SAP CDC linked service.  On each ODQ subscription, you can drill down to see individual full/delta extractions.  On each extraction, you can drill down to see individual data packages that were consumed.When ADF copy activities that extract SAP data are no longer needed, their ODQ subscriptions should be deleted, so SAP systems can stop tracking their subscription states and remove the unconsumed data packages from ODQ.  To do so, select the unneeded ODQ subscriptions and delete them. Conclusion:The SAP CDC connector in Data Factory reads delta changes from the SAP ODP framework. The deltas are recorded in ODQ tables.Reference:Overview and architecture of the SAP CDC capabilities – Azure Data Factory | Microsoft Learn         Read More Technology Blogs by Members articles 

#SAP

#SAPTechnologyblog

You May Also Like

More From Author