Exporting Tables from DSP to HDLFS using Replication Flow
Introduction
The below is the process of exporting tables from SAP Datasphere (DSP) to SAP HANA Datalake Filesystem (HDLFS) in PARQUET file format using SAP Datasphere’s Replication Flow (RF) feature. SAP DSP is a powerful platform designed to manage, model, and share data across enterprise systems. By integrating with SAP HDLFS, it enables organizations to store large volumes of structured and unstructured data efficiently for advanced analytics, archiving, or cost-effective data storage. The RF within SAP DSP simplifies data movement between different systems and environments by allowing seamless data replication.
Connection Creation
Access SAP Datasphere
Log in to SAP Datasphere: Open your browser and navigate to the SAP Datasphere home page.Navigate to Connections
Go to the Connections tab: From the main dashboard, select the ‘Connections’ tab located on the left-hand side.
Create a New Connection
Click on ‘Create’: In the Connections tab, click on the Add Connection button.
Choose Connection Type
Choose SAP HANA Datalake: From the list of available connection types, select ‘SAP HANA Cloud, Data Lake Files’.
Configure Connection Properties
Enter Connection Details: Fill in the required fields such as Host (Hostname of the SAP HDL instance) and Root Path (Path of the HDLFS where the table data needs to be stored) under Connection Details, Keystore File (.p12 format) and Keystore password under Credentials.
Add Connection Information and Save
Type in Connection Details: Provide an appropriate Business Name for the connection. The Technical Name will be filled automatically upon filling Business Name and can be changed if required. Once the details are filled, Click on ‘Create Connections’ to finalize the connection setup. Note that the Business Name, Technical Name: DSP_TO_HDL is example. Replace them with the required names.
Test the Connection
Click on ‘Validate Connection’: Upon creation of connection, it will visible in the ‘Connections’ tab. Ensure that the connection details are correct and click on the ‘Validate Connection’ button to verify the connection. The successful connection test gives a message ‘Connection “DSP_TO_HDL” is valid’.
Replication Flow Creation
Access SAP Datasphere
Log in to SAP Datasphere: Open your browser and navigate to the SAP Datasphere home page.Navigate to Data Builder
Go to the Data Builder tab: From the main dashboard, select the ‘Data Builder’ tab located on the left-hand side.
Create a New Replication Flow
Click on ‘New Replication Flow’: In the Data Builder tab, click on the ‘New Replication Flow’ button.
Select Source Systems and Table
Choose Source System: Click on ‘Select Source Connection’ and select ‘SAP Datasphere’ as the source system. Post this, click on ‘Add Source Objects’ to add the Repository Objects present in the SAP DSP. Select the required table and click on ‘OK’.
Note that the Repository Object: LARGE_DATA is example. Select the required table.
Select Target Systems
Choose Target System: Click on the ‘Select Target Connection’ and Select newly created ‘DSP_TO_HDL’ as the target system. Post this, click on ‘Select Container’ to select the path present in the SAP HDLFS. Select ‘Root Path’ (Home Button) or the required directory inside the Root Path and click on ‘Select’.
Configure Settings
Choose required settings: Click the ‘Settings’. Here, the Load Type and Truncate can be selected. Load Type can be either Initial Only or Initial and Delta.Initial Only: Load all selected data onceInitial and Delta: After the initial load, the system checks for source data changes (delta) at regular intervals and copies the changes to the target. The default value for the delta load interval is 60 minutes. You can change it in the side panel by entering an integer between 0 and 24 for hours and 0 and 59 for minutes, respectively. The maximum allowed value is 24 hours 0 minutes. If you enter 0 hours and 0 minutes, the system replicates any source changes immediately. This can be selected only if Delta Capture is turned on for a particular table.Truncate: If Truncate is marked for a database table, when you start the replication run, the system deletes the table content, but leaves the table structure intact and fills it with the relevant data from the source. If not, the system inserts new data records after the existing data in the target. For data records that already exist in the target and have been changed in the source, the system updates the target records with the changed data from the source using the UPSERT mode.
Deploy Replication Flow
Click on ‘Deploy’: Select the ‘Deploy’ button to save and deploy the Replication Flow. Provide an appropriate Business Name for the Flow. The Technical Name will be filled automatically upon filling Business Name and can be changed if required. Once the details are filled, click on ‘Save’ to finalize the Flow Deployment. Upon Successful Deployment, a message appears for the same and Deployed Status can be seen in the General section of Replication Flow Properties.
Note that the Business Name, Technical Name: LARGE_REP_4 is example. Replace them with the required names.
Executing Replication Flow
Access SAP Datasphere
Log in to SAP Datasphere: Open your browser and navigate to the SAP Datasphere home page.Navigate to Data Builder and access Replication Flow
Go to the Data Builder tab and select Replication Flow: From the main dashboard, select the ‘Data Builder’ tab located on the left-hand side and click the newly created Replication Flow.
Run Replication Flow
Click on ‘Run’: On the Replication Flow page, click the ‘Run’ button to start the execution of the replication flow. The status will be displayed in the Run Status section. To monitor the flow, click on ‘Open in Flows Monitor’ which navigates to Data Integration Monitor.
Replication Flow Completion
Completion of Flow: Upon successful completion of the Replication Flow, a message ‘DIRECT run of REPLICATION_FLOWS/RUN for LARGE_REP_4 is completed’ will be displayed. The status can be seen both in the Run Status section of Replication Flow page as well the Data Integration Monitor.
Scheduling the Replication Flows
Create Schedule: To frequently execute the Replication Flow, select the ‘Schedule’ button and click on ‘Create Schedule’ in the Data Integration Monitor. A pop-up window will appear consisting of Frequency and Time Range sections. Fill in the required details and click on ‘Create’. The flow will then run according to the specified schedule.
PARQUET Files in HDL
Upon successful completion of the Replication Flow, the data from the table LARGE_DATA in SAP DSP will be transferred to SAP HDLFS in the specified path as PARQUET files by default. This can be verified by navigating to HDLFS using the Database Explorer.
The DSP replication flow will create a new folder within the Root Path of the HDLFS, named after the table. The replication flow creates multiple files (part-*.<extension>) during initial and delta loading. The number and size of these files depends on the source table size and structure as well as change frequency (during delta loading).
Exporting Tables from DSP to HDLFS using Replication FlowIntroductionThe below is the process of exporting tables from SAP Datasphere (DSP) to SAP HANA Datalake Filesystem (HDLFS) in PARQUET file format using SAP Datasphere’s Replication Flow (RF) feature. SAP DSP is a powerful platform designed to manage, model, and share data across enterprise systems. By integrating with SAP HDLFS, it enables organizations to store large volumes of structured and unstructured data efficiently for advanced analytics, archiving, or cost-effective data storage. The RF within SAP DSP simplifies data movement between different systems and environments by allowing seamless data replication.Connection CreationAccess SAP DatasphereLog in to SAP Datasphere: Open your browser and navigate to the SAP Datasphere home page.Navigate to ConnectionsGo to the Connections tab: From the main dashboard, select the ‘Connections’ tab located on the left-hand side.Create a New ConnectionClick on ‘Create’: In the Connections tab, click on the Add Connection button.Choose Connection TypeChoose SAP HANA Datalake: From the list of available connection types, select ‘SAP HANA Cloud, Data Lake Files’.Configure Connection PropertiesEnter Connection Details: Fill in the required fields such as Host (Hostname of the SAP HDL instance) and Root Path (Path of the HDLFS where the table data needs to be stored) under Connection Details, Keystore File (.p12 format) and Keystore password under Credentials.Add Connection Information and SaveType in Connection Details: Provide an appropriate Business Name for the connection. The Technical Name will be filled automatically upon filling Business Name and can be changed if required. Once the details are filled, Click on ‘Create Connections’ to finalize the connection setup. Note that the Business Name, Technical Name: DSP_TO_HDL is example. Replace them with the required names.Test the ConnectionClick on ‘Validate Connection’: Upon creation of connection, it will visible in the ‘Connections’ tab. Ensure that the connection details are correct and click on the ‘Validate Connection’ button to verify the connection. The successful connection test gives a message ‘Connection “DSP_TO_HDL” is valid’.Replication Flow CreationAccess SAP DatasphereLog in to SAP Datasphere: Open your browser and navigate to the SAP Datasphere home page.Navigate to Data BuilderGo to the Data Builder tab: From the main dashboard, select the ‘Data Builder’ tab located on the left-hand side.Create a New Replication FlowClick on ‘New Replication Flow’: In the Data Builder tab, click on the ‘New Replication Flow’ button.Select Source Systems and TableChoose Source System: Click on ‘Select Source Connection’ and select ‘SAP Datasphere’ as the source system. Post this, click on ‘Add Source Objects’ to add the Repository Objects present in the SAP DSP. Select the required table and click on ‘OK’.Note that the Repository Object: LARGE_DATA is example. Select the required table.Select Target SystemsChoose Target System: Click on the ‘Select Target Connection’ and Select newly created ‘DSP_TO_HDL’ as the target system. Post this, click on ‘Select Container’ to select the path present in the SAP HDLFS. Select ‘Root Path’ (Home Button) or the required directory inside the Root Path and click on ‘Select’.Configure SettingsChoose required settings: Click the ‘Settings’. Here, the Load Type and Truncate can be selected. Load Type can be either Initial Only or Initial and Delta.Initial Only: Load all selected data onceInitial and Delta: After the initial load, the system checks for source data changes (delta) at regular intervals and copies the changes to the target. The default value for the delta load interval is 60 minutes. You can change it in the side panel by entering an integer between 0 and 24 for hours and 0 and 59 for minutes, respectively. The maximum allowed value is 24 hours 0 minutes. If you enter 0 hours and 0 minutes, the system replicates any source changes immediately. This can be selected only if Delta Capture is turned on for a particular table.Truncate: If Truncate is marked for a database table, when you start the replication run, the system deletes the table content, but leaves the table structure intact and fills it with the relevant data from the source. If not, the system inserts new data records after the existing data in the target. For data records that already exist in the target and have been changed in the source, the system updates the target records with the changed data from the source using the UPSERT mode.Deploy Replication FlowClick on ‘Deploy’: Select the ‘Deploy’ button to save and deploy the Replication Flow. Provide an appropriate Business Name for the Flow. The Technical Name will be filled automatically upon filling Business Name and can be changed if required. Once the details are filled, click on ‘Save’ to finalize the Flow Deployment. Upon Successful Deployment, a message appears for the same and Deployed Status can be seen in the General section of Replication Flow Properties.Note that the Business Name, Technical Name: LARGE_REP_4 is example. Replace them with the required names.Executing Replication FlowAccess SAP DatasphereLog in to SAP Datasphere: Open your browser and navigate to the SAP Datasphere home page.Navigate to Data Builder and access Replication FlowGo to the Data Builder tab and select Replication Flow: From the main dashboard, select the ‘Data Builder’ tab located on the left-hand side and click the newly created Replication Flow.Run Replication FlowClick on ‘Run’: On the Replication Flow page, click the ‘Run’ button to start the execution of the replication flow. The status will be displayed in the Run Status section. To monitor the flow, click on ‘Open in Flows Monitor’ which navigates to Data Integration Monitor.Replication Flow CompletionCompletion of Flow: Upon successful completion of the Replication Flow, a message ‘DIRECT run of REPLICATION_FLOWS/RUN for LARGE_REP_4 is completed’ will be displayed. The status can be seen both in the Run Status section of Replication Flow page as well the Data Integration Monitor.Scheduling the Replication FlowsCreate Schedule: To frequently execute the Replication Flow, select the ‘Schedule’ button and click on ‘Create Schedule’ in the Data Integration Monitor. A pop-up window will appear consisting of Frequency and Time Range sections. Fill in the required details and click on ‘Create’. The flow will then run according to the specified schedule.PARQUET Files in HDLUpon successful completion of the Replication Flow, the data from the table LARGE_DATA in SAP DSP will be transferred to SAP HDLFS in the specified path as PARQUET files by default. This can be verified by navigating to HDLFS using the Database Explorer.The DSP replication flow will create a new folder within the Root Path of the HDLFS, named after the table. The replication flow creates multiple files (part-*.<extension>) during initial and delta loading. The number and size of these files depends on the source table size and structure as well as change frequency (during delta loading). Read More Technology Blogs by Members articles
#SAP
#SAPTechnologyblog