SAP Datasphere Data Validations

Estimated read time 9 min read

Introduction

Data must be complete and consistent if it is to provide accurate analytics. This new validation framework provides initial validation rules to check the consistency of your view key columns and hierarchy nodes and provide detailed output about the erroneous records.

Two Types of Validation Rules have been released.

Key Validation RulesHierarchy Validation Rules

Key Validation Rules

The following rules are available in the Keys category:

Rule

Description

Unique Key Values

Each record must have unique primary key values.

For example, if an object has primary key columns ID and Language and a record has primary key values of 1 and EN, then no other record may have this combination of values.

No Null Key Values

No record may have null primary key values.

For example, if an object has primary key columns ID and Language no record may have null values in these columns.

 

Hierarchy Validation Rules

The following rules are available in the Hierarchies category:

Rule

Description

No Multiple Parents

No hierarchy node may have multiple parents.

For example, if a hierarchy node A has a parent node B, then it may not have any other parent.

No Circular Hierarchies

No hierarchy node may have circular parent-child relationships.

For example, if a hierarchy node A has a parent node B, then it may not also have node B as a direct child or other descendant.

 

Running a Data Validation

The Data Validation Status button in the view properties panel shows the status of the last data validation, and the Validated On field shows the date and time of the last run.

 

To validate data (or review the results of a previous validation):

Click the Data Validation Status button as indicated above to open the Data Validation panel. Alternatively, if the Data Preview panel is already open, click the Data Validation tab.

2. If the panel is empty (or if you want to re-run the validation), click Validate as indicated above.

The time required for data validation depends on the number of records and the complexity of calculations required. You will receive a notification when validation is complete.

3. Review the validation results. There is one result for each rule as shown below,

 

Reviewing Rules with Errors:

For rules with errors, click the Details link to get further information. The Details dialog explains the rule and lets you copy SQL (or SQLScript) code to generate a list of invalid records.

2. In the Details dialog, click Copy SQL (or Copy SQLScript) to get the code necessary to generate the full list of records that are not consistent with the rule

Viewing Erroneous Records – ‘Copy SQL’

Click on ‘Copy SQL’ and then Create a new SQL view, paste the code into it, and preview the data to see these failing records. Examples below,

a. Unique Key Values – Doing a data preview on the Copied SQL in the New SQL view, shows the records that violate the Unique Key Constraints along with the number of their occurrences.

b. No Null Values – Doing a data preview on the Copied SQL in the SQL view, shows all the 9 records where NULL values are present in the primary key column ‘LeafNode’

Viewing Erroneous Records – ‘Copy SQLScript ‘

Click on ‘Copy SQLScript’ and then Create a new SQL view, paste the code into it. Choose the SQLScript (TableFunction) in the view properties and add the columns as well as shown below. Then save and deploy the view and then click on Data Preview.

 

a. No Multiple Parents – Doing a data preview on the Copied and deployed SQLScript(Table Function) type SQL View, shows the records that have multiple parents.

You can see that the Tops Child has two Parents Men’s Wear and Women’s wear which is indicated in the above output.

b. No Circular Hierarchies – Doing a data preview on the Copied and deployed SQLScript(Table Function) type SQL View, shows the records that have a circular hierarchy relationship

You can see below the Circular Hierarchy relationship in the dataset as indicated in the above output

Limitations

Some validation rules may be unavailable if:

Your view or any of its sources is protected by a data access control.Your view or any of its sources contains input parameters.Any of your view’s sources is not replicated to SAP Datasphere.

 

Roadmap Items

The goal is subsequently enrich the data validation framework with more validation rules. Please add comments on this blog post about other validations that might be beneficial for you.

For more details about additional validations coming in Q4 2024, refer to the roadmap item – Additional model validations

Referential integrity – for associated entities, system checks on whether for every member in the association source’s foreign key there is a respective member in the association targetAssociation consistency – checks to help ensure that associations that use non-key fields still have key-like properties and therefore avoid unintended data duplication

 

Summary

With the data validation framework, you should now be able to validate your metadata & data and correct errors easily based on the information available on the erroneous records.

I would like to thank @mona_durai  for her collaboration in this blog.

 

 

 

 

 

 

 

 

​ IntroductionData must be complete and consistent if it is to provide accurate analytics. This new validation framework provides initial validation rules to check the consistency of your view key columns and hierarchy nodes and provide detailed output about the erroneous records.Two Types of Validation Rules have been released.Key Validation RulesHierarchy Validation RulesKey Validation RulesThe following rules are available in the Keys category:RuleDescriptionUnique Key ValuesEach record must have unique primary key values.For example, if an object has primary key columns ID and Language and a record has primary key values of 1 and EN, then no other record may have this combination of values.No Null Key ValuesNo record may have null primary key values.For example, if an object has primary key columns ID and Language no record may have null values in these columns. Hierarchy Validation RulesThe following rules are available in the Hierarchies category:RuleDescriptionNo Multiple ParentsNo hierarchy node may have multiple parents.For example, if a hierarchy node A has a parent node B, then it may not have any other parent.No Circular HierarchiesNo hierarchy node may have circular parent-child relationships.For example, if a hierarchy node A has a parent node B, then it may not also have node B as a direct child or other descendant. Running a Data ValidationThe Data Validation Status button in the view properties panel shows the status of the last data validation, and the Validated On field shows the date and time of the last run. To validate data (or review the results of a previous validation):Click the Data Validation Status button as indicated above to open the Data Validation panel. Alternatively, if the Data Preview panel is already open, click the Data Validation tab.2. If the panel is empty (or if you want to re-run the validation), click Validate as indicated above.The time required for data validation depends on the number of records and the complexity of calculations required. You will receive a notification when validation is complete.3. Review the validation results. There is one result for each rule as shown below, Reviewing Rules with Errors:For rules with errors, click the Details link to get further information. The Details dialog explains the rule and lets you copy SQL (or SQLScript) code to generate a list of invalid records.2. In the Details dialog, click Copy SQL (or Copy SQLScript) to get the code necessary to generate the full list of records that are not consistent with the ruleViewing Erroneous Records – ‘Copy SQL’Click on ‘Copy SQL’ and then Create a new SQL view, paste the code into it, and preview the data to see these failing records. Examples below,a. Unique Key Values – Doing a data preview on the Copied SQL in the New SQL view, shows the records that violate the Unique Key Constraints along with the number of their occurrences.b. No Null Values – Doing a data preview on the Copied SQL in the SQL view, shows all the 9 records where NULL values are present in the primary key column ‘LeafNode’Viewing Erroneous Records – ‘Copy SQLScript ‘Click on ‘Copy SQLScript’ and then Create a new SQL view, paste the code into it. Choose the SQLScript (TableFunction) in the view properties and add the columns as well as shown below. Then save and deploy the view and then click on Data Preview. a. No Multiple Parents – Doing a data preview on the Copied and deployed SQLScript(Table Function) type SQL View, shows the records that have multiple parents.You can see that the Tops Child has two Parents Men’s Wear and Women’s wear which is indicated in the above output.b. No Circular Hierarchies – Doing a data preview on the Copied and deployed SQLScript(Table Function) type SQL View, shows the records that have a circular hierarchy relationshipYou can see below the Circular Hierarchy relationship in the dataset as indicated in the above outputLimitationsSome validation rules may be unavailable if:Your view or any of its sources is protected by a data access control.Your view or any of its sources contains input parameters.Any of your view’s sources is not replicated to SAP Datasphere. Roadmap ItemsThe goal is subsequently enrich the data validation framework with more validation rules. Please add comments on this blog post about other validations that might be beneficial for you.For more details about additional validations coming in Q4 2024, refer to the roadmap item – Additional model validationsReferential integrity – for associated entities, system checks on whether for every member in the association source’s foreign key there is a respective member in the association targetAssociation consistency – checks to help ensure that associations that use non-key fields still have key-like properties and therefore avoid unintended data duplication SummaryWith the data validation framework, you should now be able to validate your metadata & data and correct errors easily based on the information available on the erroneous records.I would like to thank @mona_durai  for her collaboration in this blog.          Read More Technology Blogs by SAP articles 

#SAP

#SAPTechnologyblog

You May Also Like

More From Author

+ There are no comments

Add yours