Analyzing views, persisting and partitioning data in SAP Datasphere

<H1> Analyzing views, persisting and partitioning data in SAP Datasphere

As the complexity and volume of data in your data warehouse increases, persisting and partitioning data in specific parts of your data model can prove beneficial.

This blog provides a step-by-step guide to analyzing views, partitioning and persisting data in Datasphere.

If you’re experiencing declining performance but don’t know where to start, we have another blog for you, “Performance Monitoring in SAP Datasphere,” which can be found here.

Analyzing views with the view analyzer

If you already know which model is consuming high memory or high compute-resources, a good starting point is the View Analyzer. This tool provides an easily generated overview of your data model and suggests which tables and views could be persisted to improve overall performance.

You can find the View Analyzer in the Data Integration Monitor. Once you select the specific view you’re interested in, locate the View Analyzer under the “Data Persistence” tab.

The view analyzer has 4 options:

Without memory consumption

This option generates an overview of entities and provides so called ‘candidates’ for persistence. This option does not simulate the persistency of the data. If the view has already been persisted, it will show the memory consumption of the data persistence task instead.

Generate explain plan

This provides an overview of operators, execution engines, views, and tables. You can generate and analyze this result

With memory consumption

In addition to the default analysis, this option simulates persistency for all non-persisted views. If views have already been persisted, it provides the actual memory consumption.

Generate SQL Analyzer Plan file

This generates a persistency simulation for the selected view. When the simulation is complete, an SQL Analyzer Plan file is created, which can be analyzed further in tools like HANA Studio or Eclipse. More information on such an analysis can be found here.

<H3> Analyzing results

After running the analyzer, you can find the results under the “Entities” tab. This overview provides recommendations on which views can be persisted to improve performance. The general rule of thumb is to balance persisting views (for improved performance) with virtual modeling (for lower storage costs, reduced maintenance, and real-time data availability).

The same results can also be visualized in the “Data Lineage.” Each view in the lineage includes icons in the bottom-right corner, indicating whether a view is persisted and showing its relative persistency score. This offers a clear visual representation of where data persistence candidates reside in your data model.

View persistency

A great way to improve the performance of your model is by using view persistency. Why? Because it is simple to set up and schedule. This represents a significant improvement compared to HANA native, where creating table structures and scheduling was less straightforward.

If you know which view to persist, locate it in the Data Monitor or navigate to the last node of your Graphical View or SQL View. From there, access the “Data Persistence” tab within the model properties.

Once you enable persistency, all views built on top of the persisted table will immediately start using it, eliminating the need to replace data sources manually. Additionally, if you remove the persistence, all subsequent views will revert to using the virtual model immediately.

<H5> Partitioning

An additional feature is the ability to create partitions, which has two key benefits:

Performance Improvements

This depends on how the dataset is queried. Query performance improves if a subset of all the partitions is requested. If all partitions are queried, then there is no performance gain.

Locking Partitions

After partitions are generated and populated, you can lock them. This is useful if you are certain that specific data will not change. In such cases, refreshing the full dataset is unnecessary, as locked partitions prevent these entries from being deleted during subsequent runs.

To create partitions:

Navigate to the “Partitions” tab and select “Define Partitions.” Define the column to partition on. You can select it manually or use the “Rate Column Suitability” option, which recommends a column. Once a column is selected, define partitions for it. For example, you could create partitions based on a “Posting Date.”

Once partitions are defined, you can choose to lock them or leave them unlocked. Note that partitioning currently lacks features to dynamically create input for these partition ranges. If you’re only interested in the last 12-24 months, you’ll need to maintain such partitions manually. Alternatively, consider persisting data using Data Flows and local tables, which provide more capabilities to dynamically delete and append data based on date functions.

Conclusion

The View Analyzer, View Persistency and Partitioning are powerful tools for optimizing the performance of your data models. They are easy to set up and provide valuable insights. For more complex views, the SQL Analyzer Plan file can be used for in-depth analysis. By leveraging these tools, you can strike an effective balance between performance improvements and resource optimization.

References

https://help.sap.com/docs/SAP_DATASPHERE/be5967d099974c69b77f4549425ca4c0/8921e5acf2ad4c8a98073edae4c214c7.html?locale=en-US

Data Persistence Guidance | SAP Help Portal

<H1> Analyzing views, persisting and partitioning data in SAP Datasphere As the complexity and volume of data in your data warehouse increases, persisting and partitioning data in specific parts of your data model can prove beneficial. This blog provides a step-by-step guide to analyzing views, partitioning and persisting data in Datasphere. If you’re experiencing declining performance but don’t know where to start, we have another blog for you, “Performance Monitoring in SAP Datasphere,” which can be found here. Analyzing views with the view analyzer If you already know which model is consuming high memory or high compute-resources, a good starting point is the View Analyzer. This tool provides an easily generated overview of your data model and suggests which tables and views could be persisted to improve overall performance. You can find the View Analyzer in the Data Integration Monitor. Once you select the specific view you’re interested in, locate the View Analyzer under the “Data Persistence” tab. The view analyzer has 4 options: Without memory consumption This option generates an overview of entities and provides so called ‘candidates’ for persistence. This option does not simulate the persistency of the data. If the view has already been persisted, it will show the memory consumption of the data persistence task instead. Generate explain plan This provides an overview of operators, execution engines, views, and tables. You can generate and analyze this result With memory consumption In addition to the default analysis, this option simulates persistency for all non-persisted views. If views have already been persisted, it provides the actual memory consumption. Generate SQL Analyzer Plan file This generates a persistency simulation for the selected view. When the simulation is complete, an SQL Analyzer Plan file is created, which can be analyzed further in tools like HANA Studio or Eclipse. More information on such an analysis can be found here. <H3> Analyzing results After running the analyzer, you can find the results under the “Entities” tab. This overview provides recommendations on which views can be persisted to improve performance. The general rule of thumb is to balance persisting views (for improved performance) with virtual modeling (for lower storage costs, reduced maintenance, and real-time data availability). The same results can also be visualized in the “Data Lineage.” Each view in the lineage includes icons in the bottom-right corner, indicating whether a view is persisted and showing its relative persistency score. This offers a clear visual representation of where data persistence candidates reside in your data model. View persistency A great way to improve the performance of your model is by using view persistency. Why? Because it is simple to set up and schedule. This represents a significant improvement compared to HANA native, where creating table structures and scheduling was less straightforward. If you know which view to persist, locate it in the Data Monitor or navigate to the last node of your Graphical View or SQL View. From there, access the “Data Persistence” tab within the model properties. Once you enable persistency, all views built on top of the persisted table will immediately start using it, eliminating the need to replace data sources manually. Additionally, if you remove the persistence, all subsequent views will revert to using the virtual model immediately. <H5> Partitioning An additional feature is the ability to create partitions, which has two key benefits: Performance Improvements This depends on how the dataset is queried. Query performance improves if a subset of all the partitions is requested. If all partitions are queried, then there is no performance gain. Locking Partitions After partitions are generated and populated, you can lock them. This is useful if you are certain that specific data will not change. In such cases, refreshing the full dataset is unnecessary, as locked partitions prevent these entries from being deleted during subsequent runs. To create partitions: Navigate to the “Partitions” tab and select “Define Partitions.” Define the column to partition on. You can select it manually or use the “Rate Column Suitability” option, which recommends a column. Once a column is selected, define partitions for it. For example, you could create partitions based on a “Posting Date.” Once partitions are defined, you can choose to lock them or leave them unlocked. Note that partitioning currently lacks features to dynamically create input for these partition ranges. If you’re only interested in the last 12-24 months, you’ll need to maintain such partitions manually. Alternatively, consider persisting data using Data Flows and local tables, which provide more capabilities to dynamically delete and append data based on date functions. Conclusion The View Analyzer, View Persistency and Partitioning are powerful tools for optimizing the performance of your data models. They are easy to set up and provide valuable insights. For more complex views, the SQL Analyzer Plan file can be used for in-depth analysis. By leveraging these tools, you can strike an effective balance between performance improvements and resource optimization. References https://help.sap.com/docs/SAP_DATASPHERE/be5967d099974c69b77f4549425ca4c0/8921e5acf2ad4c8a98073edae4c214c7.html?locale=en-US Data Persistence Guidance | SAP Help Portal Read More Technology Blogs by Members articles

#SAP

#SAPTechnologyblog