Beginner’s guide to Data Flows in SAP Datasphere: A step by step approach

Introduction:

Data flow is a key component of SAP Datasphere as it allows users to perform complex data transformations, enrich data, and structure it for reporting and analytics.

In this blog, we will explore the basic features and steps to create a Data Flow in SAP Datasphere.

A Data Flow is a graphical ETL (Extract, Transform, Load) tool that allows users to:

Extract data from various sources (SAP and non-SAP).
Apply transformations such as filtering, aggregations, joins, and calculated columns.
Load transformed data into target tables for reporting and analytics.

There are 3 main components of a data flow:

Steps Involved:

Below are the steps involved in creating a data flow in SAP Datasphere:-

Selecting the data flow optionChoosing one or more source tablesCreating the flow for data transformation Choosing a target table to load the result dataSave and Deploy

Step 1 : Selecting the data flow option

Step 2 : Choosing one or more source tables

The source tables are available in the left most pane. Drag and drop these objects to the canvas.

Repository to choose local tables in Datasphere and Sources to choose directly from a connection already established in Datasphere.

Step 3 : Creating the flow for data transformation

The options shown below will be available when you click on an object. These options shown below can be used to connect one object to another, data preview, lineage analysis and delete the object respectively.

Given below are few basic operators which can be used in a data flow.

ProjectionJoinUnionAggregation

1. Projection

This operator creates a projection of the source table it’s connected to, where columns/attributes can be renamed or new calculated columns can be created without disturbing the source.

2. Join

Joins are used to join two tables horizontally using one or more common fields between them. Drag and drop join from the operators available and connect two datasets to the join operator.

Joins can be of three types here which has to be selected:

InnerLeft OuterRight Outer

To configure the join operator:-

Go to Join definition and drag and drop to connect common fields.

3. Union

A union is used to join two tables vertically. Hence the structure of these two tables should be identical in terms of number of fields and the datatype of these fields. Drag and drop union from the operators available and connect two datasets to the union operator.

To configure the union operator:-

Click on the union operator and go to the details pageUsing the settings icon on the right in mappings section, and select delete all output columns.Select a projection/dataset on the left side.From the settings icon on the right side, click on add all source columns as output columns.Select the second projection/dataset on the left side.Now map the fields by drag and drop, from the second projection to the first projection left to right.Check the mapped and unmapped fields separately.

4. Aggregation

To aggregate a specific measure according to the requirements. Drag and drop Aggregation from the operators available and connect a single dataset node to the Aggregation operator.

Choose the field in the aggregation operator and select the type of aggregation i.e., SUM, AVERAGE, MIN, MAX or COUNT to be reflected in the final table.

Step 4 : Choosing a target table to load the result data

Drag and drop from the repository to use and existing table with the same structure or create a target table.Provide technical/business names and then create and deploy the table.Select the mode of upload to target table.Tick the update records by primary key option if you’ve selected append.

Step 5 : Save, Deploy, Run

Save, deploy, and then run the data flow.

Once the data flow is run, it can be scheduled or monitored by selecting the option as given below.

Introduction:Data flow is a key component of SAP Datasphere as it allows users to perform complex data transformations, enrich data, and structure it for reporting and analytics.In this blog, we will explore the basic features and steps to create a Data Flow in SAP Datasphere.A Data Flow is a graphical ETL (Extract, Transform, Load) tool that allows users to: Extract data from various sources (SAP and non-SAP). Apply transformations such as filtering, aggregations, joins, and calculated columns. Load transformed data into target tables for reporting and analytics.There are 3 main components of a data flow: Steps Involved:Below are the steps involved in creating a data flow in SAP Datasphere:-Selecting the data flow optionChoosing one or more source tablesCreating the flow for data transformation Choosing a target table to load the result dataSave and Deploy Step 1 : Selecting the data flow optionLogin to SAP Datasphere –> Data Builder –> Data Flow Step 2 : Choosing one or more source tablesThe source tables are available in the left most pane. Drag and drop these objects to the canvas.Repository to choose local tables in Datasphere and Sources to choose directly from a connection already established in Datasphere. Step 3 : Creating the flow for data transformationThe options shown below will be available when you click on an object. These options shown below can be used to connect one object to another, data preview, lineage analysis and delete the object respectively. Given below are few basic operators which can be used in a data flow. ProjectionJoinUnionAggregation 1. ProjectionThis operator creates a projection of the source table it’s connected to, where columns/attributes can be renamed or new calculated columns can be created without disturbing the source. 2. JoinJoins are used to join two tables horizontally using one or more common fields between them. Drag and drop join from the operators available and connect two datasets to the join operator. Joins can be of three types here which has to be selected: InnerLeft OuterRight Outer To configure the join operator:-Go to Join definition and drag and drop to connect common fields. 3. UnionA union is used to join two tables vertically. Hence the structure of these two tables should be identical in terms of number of fields and the datatype of these fields. Drag and drop union from the operators available and connect two datasets to the union operator. To configure the union operator:-Click on the union operator and go to the details pageUsing the settings icon on the right in mappings section, and select delete all output columns.Select a projection/dataset on the left side.From the settings icon on the right side, click on add all source columns as output columns.Select the second projection/dataset on the left side.Now map the fields by drag and drop, from the second projection to the first projection left to right.Check the mapped and unmapped fields separately. 4. AggregationTo aggregate a specific measure according to the requirements. Drag and drop Aggregation from the operators available and connect a single dataset node to the Aggregation operator. Choose the field in the aggregation operator and select the type of aggregation i.e., SUM, AVERAGE, MIN, MAX or COUNT to be reflected in the final table. Step 4 : Choosing a target table to load the result data Drag and drop from the repository to use and existing table with the same structure or create a target table.Provide technical/business names and then create and deploy the table.Select the mode of upload to target table.Tick the update records by primary key option if you’ve selected append. Step 5 : Save, Deploy, RunSave, deploy, and then run the data flow.Once the data flow is run, it can be scheduled or monitored by selecting the option as given below. Read More Technology Blogs by Members articles

#SAP

#SAPTechnologyblog