Simplifying Time Series Analytics with Unified Time Series Interface

Estimated read time 12 min read

Time series analysis is fundamental in industries ranging from retail to finance, helping businesses forecast trends, predict anomalies, and optimize operations. Traditional approaches, however, often require complex preprocessing, data conversion, and algorithm selection, posing challenges for less technical users.

To address these issues, SAP HANA Predictive Analysis Library (PAL) has introduced a unified interface for time series algorithms. Following the successful implementation of its unified classification and regression interfaces, this update aims to make time series analysis more efficient and user-friendly.

In this blog post, we explore the latest features of this unified interface and showcase an example to illustrate its usage.

Key Highlights

Let’s dive into new interface’s key features in detail:

Unified Workflow

The unified interface streamlines the management of PAL algorithms by providing a standardized structure for invoking them. This simplifies parameter handling and data preparation for individual algorithms, enhancing efficiency and ease of use. Supported algorithms include Additive Model Time Series Analysis (AMTSA), Auto Regressive Integrated Moving Average (ARIMA), Bayesian Structural Time Series (BSTS), and Exponential Smoothing (SMOOTH).

Automatic Timestamp Conversion

The datasets of different time series analysis tasks can have diverse time formats, therefore automatic timestamp conversion is introduced in new unified interface. This feature automatically detects and converts between integer timepoints and timestamp types. To convert timepoints to timestamps, users must define START_POINT and INTERVAL. INTERVAL represents the spacing between timestamps, measured in the smallest unit of the target type (TARGET_TYPE). For instance, if the target type is DAYDATE and a weekly interval is desired, the INTERVAL value would be set to 7. Conversely, converting timestamps to timepoints is automated, with the system generating consecutive integers based on input timestamps. However, the input timestamps should be evenly spaced for this conversion to function effectively.

Pivoted Input Data Format Support

Traditionally, additional steps are required to transform the pivoted data into a usable format. To simplify this data preparation process, the new unified interface directly supports pivoted input data formats. This feature is particularly beneficial for complex, multidimensional time series data. The structure of input data is defined in the metadata table,  as illustrated below.

CREATE COLUMN TABLE PAL_META_DATA_TBL (
“VARIABLE_NAME” NVARCHAR (50),
“VARIABLE_TYPE” NVARCHAR (50)
);
INSERT INTO PAL_META_DATA_TBL VALUES (‘TIMESTAMP’, ‘CONTINUOUS’);
INSERT INTO PAL_META_DATA_TBL VALUES (‘Y’, ‘TARGET’);

Massive Mode Capability

When dealing with vast datasets, users can leverage “massive mode” in unified interface. This mode enables algorithms to process multiple datasets simultaneously, with each dataset being executed independently and in parallel. To learn more about massive mode, visit the page on Massive Execution of PAL Functions.

Example

Let’s demonstrate the new interface with an example. Note that the code provided is purely for illustrative purposes and is not intended for production use.

The dataset is the Beijing PM2.5 data from the UCI Machine Learning Repository. It comprises hourly recordings of PM2.5 levels (airborne particles with aerodynamic diameters less than 2.5 μm) collected by the US Embassy in Beijing between January 1, 2010, and December 31, 2014. Additionally, meteorological data from Beijing Capital International Airport is included. The objective is to predict PM2.5 concentrations using various input features.

This dataset contains 43,824 rows and 11 columns. During preprocessing, the year, month, day, and hour columns were merged into a single ‘date’ column, and rows with missing values were addressed. The restructured dataset included the following 9 columns.

date: Timestamp of the record

pollution: PM2.5 concentration (ug/m^3)

dew: Dew Point

temp: Temperature

press: Pressure (hPa)

wnd_dir: Combined wind direction

wnd_spd: Cumulated wind speed (m/s)

snow: Cumulated hours of snow

rain: Cumulated hours of rain

To make it more manageable for demonstration purposes, we selected the first 1,000 instances. From this selection, we allocated 990 instances to the training set and reserved the final 10 for the testing set. Here’s a glimpse at the first five rows of the training set.

Once the data is loaded, the model can be trained, and results can be obtained using the following annotated SQL script.

–########## COLUMN TABLE CREATION ##########
CREATE COLUMN TABLE PAL_PARAMETER_TBL__0 (“PARAM_NAME” NVARCHAR(256), “INT_VALUE” INTEGER, “DOUBLE_VALUE” DOUBLE, “STRING_VALUE” NVARCHAR(1000));
CREATE COLUMN TABLE PAL_MODEL_TBL__0 (“INDEX” NVARCHAR (50), “CONTENT” NCLOB);
CREATE COLUMN TABLE PAL_STATISTICS_TBL__0 (“NAME” NVARCHAR (50), “VALUE_1” DOUBLE, “VALUE_2” DOUBLE, “VALUE_3” DOUBLE, “VALUE_4” DOUBLE, “VALUE_5” DOUBLE, “REASON” NVARCHAR (50));
CREATE COLUMN TABLE PAL_DECOMPOSE_TBL__0 (“TIME_STAMP” NVARCHAR (50), “TREND” DOUBLE, “SEASONAL” DOUBLE, “REGRESSION” DOUBLE, “RANDOM” DOUBLE);
CREATE COLUMN TABLE PAL_PLACE_HOLDER_TBL__0 (“OBJECT” NVARCHAR (10), “KEY” NVARCHAR (10), “VALUE” NVARCHAR (10));
CREATE COLUMN TABLE PAL_PREDICT_PARAMETER_TBL__0 (“PARAM_NAME” NVARCHAR(256), “INT_VALUE” INTEGER, “DOUBLE_VALUE” DOUBLE, “STRING_VALUE” NVARCHAR(1000));
CREATE COLUMN TABLE PAL_PREDICT_RESULT_TBL__0 (“TIME_STAMP” NVARCHAR (50), “FORECAST” DOUBLE, “VALUE_1” DOUBLE, “VALUE_2” DOUBLE, “VALUE_3” DOUBLE, “VALUE_4” DOUBLE, “VALUE_5” DOUBLE);
CREATE COLUMN TABLE PAL_PREDICT_DECOMPOSITION_TBL__0 (“TIME_STAMP” NVARCHAR (50), “VALUE_1” DOUBLE, “VALUE_2” NCLOB, “VALUE_3” NCLOB, “VALUE_4” NCLOB, “VALUE_5” NCLOB);
CREATE COLUMN TABLE PAL_PREDICT_PLACE_HOLDER_TBL__0 (“OBJECT” NVARCHAR (50), “KEY” NVARCHAR (50), “VALUE” NVARCHAR (50));

–########## TABLE INSERTS ##########
— The training data is stored in PAL_DATA_TBL__0, and the prediction data in PAL_PREDICT_DATA_TBL__0.
–########## PAL_PARAMETER_TBL__0 DATA INSERTION ##########
— Specify algorithm type, 0: AMTSA, 1: ARIMA, 2: BSTS, 3: SMOOTH
INSERT INTO PAL_PARAMETER_TBL__0 VALUES (‘FUNCTION’, 0, NULL, NULL);

–########## UNIFIED INTERFACE FOR TIME SERIES CALL ##########
DO BEGIN
lt_data = SELECT * FROM PAL_DATA_TBL__0;
lt_param = SELECT * FROM PAL_PARAMETER_TBL__0;
CALL _SYS_AFL.PAL_UNIFIED_TIMESERIES (:lt_data, :lt_param, lt_model, lt_stat, lt_decom, lt_ph);
lt_pdata = SELECT * FROM PAL_PREDICT_DATA_TBL__0;
lt_pparam = SELECT * FROM PAL_PREDICT_PARAMETER_TBL__0;
CALL _SYS_AFL.PAL_UNIFIED_TIMESERIES_PREDICT (:lt_pdata, :lt_model, :lt_pparam, lt_result, lt_decomp, lt_pph);
INSERT INTO PAL_PREDICT_RESULT_TBL__0 SELECT * FROM :lt_result;
INSERT INTO PAL_PREDICT_DECOMPOSITION_TBL__0 SELECT * FROM :lt_decomp;
END;

–########## SELECT * TABLES ##########
SELECT * FROM PAL_PREDICT_RESULT_TBL__0;
SELECT * FROM PAL_PREDICT_DECOMPOSITION_TBL__0;

–########## TABLES CLEANUP ##########
DROP TABLE PAL_PARAMETER_TBL__0;
DROP TABLE PAL_MODEL_TBL__0;
DROP TABLE PAL_STATISTICS_TBL__0;
DROP TABLE PAL_DECOMPOSE_TBL__0;
DROP TABLE PAL_PLACE_HOLDER_TBL__0;
DROP TABLE PAL_PREDICT_PARAMETER_TBL__0;
DROP TABLE PAL_PREDICT_RESULT_TBL__0;
DROP TABLE PAL_PREDICT_DECOMPOSITION_TBL__0;
DROP TABLE PAL_PREDICT_PLACE_HOLDER_TBL__0;

You can view the model, prediction results, and decomposition in the output tables. Below are illustrative snapshots of the output tables.

The composition of the resulting tables depends on the selected algorithm. For AMTSA, the result table includes the predicted values along with the lower and upper bounds of the uncertainty intervals. Additionally, the decomposition table provides various components, such as trend, seasonality, and others.

Summary

The unified interface is introduced to simplify the usage of PAL algorithms. This blog post highlights the key features addressing challenges in time series analysis, such as varied time formats, pivoted data structures, and handling large data volumes. This new interface makes it easier for users to unlock the potential of their temporal data.

 

Recent topics on HANA machine learning:

Comprehensive Guide to MLTrack in SAP HANA Cloud: End-to-End Machine Learning Experiment Tracking

New Machine Learning and AI features in SAP HANA Cloud 2025 Q2

New Machine Learning and AI features in SAP HANA Cloud 2025 Q1

 

​ Time series analysis is fundamental in industries ranging from retail to finance, helping businesses forecast trends, predict anomalies, and optimize operations. Traditional approaches, however, often require complex preprocessing, data conversion, and algorithm selection, posing challenges for less technical users.To address these issues, SAP HANA Predictive Analysis Library (PAL) has introduced a unified interface for time series algorithms. Following the successful implementation of its unified classification and regression interfaces, this update aims to make time series analysis more efficient and user-friendly.In this blog post, we explore the latest features of this unified interface and showcase an example to illustrate its usage.Key HighlightsLet’s dive into new interface’s key features in detail:Unified WorkflowThe unified interface streamlines the management of PAL algorithms by providing a standardized structure for invoking them. This simplifies parameter handling and data preparation for individual algorithms, enhancing efficiency and ease of use. Supported algorithms include Additive Model Time Series Analysis (AMTSA), Auto Regressive Integrated Moving Average (ARIMA), Bayesian Structural Time Series (BSTS), and Exponential Smoothing (SMOOTH).Automatic Timestamp ConversionThe datasets of different time series analysis tasks can have diverse time formats, therefore automatic timestamp conversion is introduced in new unified interface. This feature automatically detects and converts between integer timepoints and timestamp types. To convert timepoints to timestamps, users must define START_POINT and INTERVAL. INTERVAL represents the spacing between timestamps, measured in the smallest unit of the target type (TARGET_TYPE). For instance, if the target type is DAYDATE and a weekly interval is desired, the INTERVAL value would be set to 7. Conversely, converting timestamps to timepoints is automated, with the system generating consecutive integers based on input timestamps. However, the input timestamps should be evenly spaced for this conversion to function effectively.Pivoted Input Data Format SupportTraditionally, additional steps are required to transform the pivoted data into a usable format. To simplify this data preparation process, the new unified interface directly supports pivoted input data formats. This feature is particularly beneficial for complex, multidimensional time series data. The structure of input data is defined in the metadata table,  as illustrated below.CREATE COLUMN TABLE PAL_META_DATA_TBL (
“VARIABLE_NAME” NVARCHAR (50),
“VARIABLE_TYPE” NVARCHAR (50)
);
INSERT INTO PAL_META_DATA_TBL VALUES (‘TIMESTAMP’, ‘CONTINUOUS’);
INSERT INTO PAL_META_DATA_TBL VALUES (‘Y’, ‘TARGET’);Massive Mode CapabilityWhen dealing with vast datasets, users can leverage “massive mode” in unified interface. This mode enables algorithms to process multiple datasets simultaneously, with each dataset being executed independently and in parallel. To learn more about massive mode, visit the page on Massive Execution of PAL Functions.ExampleLet’s demonstrate the new interface with an example. Note that the code provided is purely for illustrative purposes and is not intended for production use.The dataset is the Beijing PM2.5 data from the UCI Machine Learning Repository. It comprises hourly recordings of PM2.5 levels (airborne particles with aerodynamic diameters less than 2.5 μm) collected by the US Embassy in Beijing between January 1, 2010, and December 31, 2014. Additionally, meteorological data from Beijing Capital International Airport is included. The objective is to predict PM2.5 concentrations using various input features.This dataset contains 43,824 rows and 11 columns. During preprocessing, the year, month, day, and hour columns were merged into a single ‘date’ column, and rows with missing values were addressed. The restructured dataset included the following 9 columns.date: Timestamp of the recordpollution: PM2.5 concentration (ug/m^3)dew: Dew Pointtemp: Temperaturepress: Pressure (hPa)wnd_dir: Combined wind directionwnd_spd: Cumulated wind speed (m/s)snow: Cumulated hours of snowrain: Cumulated hours of rainTo make it more manageable for demonstration purposes, we selected the first 1,000 instances. From this selection, we allocated 990 instances to the training set and reserved the final 10 for the testing set. Here’s a glimpse at the first five rows of the training set.Once the data is loaded, the model can be trained, and results can be obtained using the following annotated SQL script.–########## COLUMN TABLE CREATION ##########
CREATE COLUMN TABLE PAL_PARAMETER_TBL__0 (“PARAM_NAME” NVARCHAR(256), “INT_VALUE” INTEGER, “DOUBLE_VALUE” DOUBLE, “STRING_VALUE” NVARCHAR(1000));
CREATE COLUMN TABLE PAL_MODEL_TBL__0 (“INDEX” NVARCHAR (50), “CONTENT” NCLOB);
CREATE COLUMN TABLE PAL_STATISTICS_TBL__0 (“NAME” NVARCHAR (50), “VALUE_1” DOUBLE, “VALUE_2” DOUBLE, “VALUE_3” DOUBLE, “VALUE_4” DOUBLE, “VALUE_5” DOUBLE, “REASON” NVARCHAR (50));
CREATE COLUMN TABLE PAL_DECOMPOSE_TBL__0 (“TIME_STAMP” NVARCHAR (50), “TREND” DOUBLE, “SEASONAL” DOUBLE, “REGRESSION” DOUBLE, “RANDOM” DOUBLE);
CREATE COLUMN TABLE PAL_PLACE_HOLDER_TBL__0 (“OBJECT” NVARCHAR (10), “KEY” NVARCHAR (10), “VALUE” NVARCHAR (10));
CREATE COLUMN TABLE PAL_PREDICT_PARAMETER_TBL__0 (“PARAM_NAME” NVARCHAR(256), “INT_VALUE” INTEGER, “DOUBLE_VALUE” DOUBLE, “STRING_VALUE” NVARCHAR(1000));
CREATE COLUMN TABLE PAL_PREDICT_RESULT_TBL__0 (“TIME_STAMP” NVARCHAR (50), “FORECAST” DOUBLE, “VALUE_1” DOUBLE, “VALUE_2” DOUBLE, “VALUE_3” DOUBLE, “VALUE_4” DOUBLE, “VALUE_5” DOUBLE);
CREATE COLUMN TABLE PAL_PREDICT_DECOMPOSITION_TBL__0 (“TIME_STAMP” NVARCHAR (50), “VALUE_1” DOUBLE, “VALUE_2” NCLOB, “VALUE_3” NCLOB, “VALUE_4” NCLOB, “VALUE_5” NCLOB);
CREATE COLUMN TABLE PAL_PREDICT_PLACE_HOLDER_TBL__0 (“OBJECT” NVARCHAR (50), “KEY” NVARCHAR (50), “VALUE” NVARCHAR (50));

–########## TABLE INSERTS ##########
— The training data is stored in PAL_DATA_TBL__0, and the prediction data in PAL_PREDICT_DATA_TBL__0.
–########## PAL_PARAMETER_TBL__0 DATA INSERTION ##########
— Specify algorithm type, 0: AMTSA, 1: ARIMA, 2: BSTS, 3: SMOOTH
INSERT INTO PAL_PARAMETER_TBL__0 VALUES (‘FUNCTION’, 0, NULL, NULL);

–########## UNIFIED INTERFACE FOR TIME SERIES CALL ##########
DO BEGIN
lt_data = SELECT * FROM PAL_DATA_TBL__0;
lt_param = SELECT * FROM PAL_PARAMETER_TBL__0;
CALL _SYS_AFL.PAL_UNIFIED_TIMESERIES (:lt_data, :lt_param, lt_model, lt_stat, lt_decom, lt_ph);
lt_pdata = SELECT * FROM PAL_PREDICT_DATA_TBL__0;
lt_pparam = SELECT * FROM PAL_PREDICT_PARAMETER_TBL__0;
CALL _SYS_AFL.PAL_UNIFIED_TIMESERIES_PREDICT (:lt_pdata, :lt_model, :lt_pparam, lt_result, lt_decomp, lt_pph);
INSERT INTO PAL_PREDICT_RESULT_TBL__0 SELECT * FROM :lt_result;
INSERT INTO PAL_PREDICT_DECOMPOSITION_TBL__0 SELECT * FROM :lt_decomp;
END;

–########## SELECT * TABLES ##########
SELECT * FROM PAL_PREDICT_RESULT_TBL__0;
SELECT * FROM PAL_PREDICT_DECOMPOSITION_TBL__0;

–########## TABLES CLEANUP ##########
DROP TABLE PAL_PARAMETER_TBL__0;
DROP TABLE PAL_MODEL_TBL__0;
DROP TABLE PAL_STATISTICS_TBL__0;
DROP TABLE PAL_DECOMPOSE_TBL__0;
DROP TABLE PAL_PLACE_HOLDER_TBL__0;
DROP TABLE PAL_PREDICT_PARAMETER_TBL__0;
DROP TABLE PAL_PREDICT_RESULT_TBL__0;
DROP TABLE PAL_PREDICT_DECOMPOSITION_TBL__0;
DROP TABLE PAL_PREDICT_PLACE_HOLDER_TBL__0;You can view the model, prediction results, and decomposition in the output tables. Below are illustrative snapshots of the output tables.The composition of the resulting tables depends on the selected algorithm. For AMTSA, the result table includes the predicted values along with the lower and upper bounds of the uncertainty intervals. Additionally, the decomposition table provides various components, such as trend, seasonality, and others.SummaryThe unified interface is introduced to simplify the usage of PAL algorithms. This blog post highlights the key features addressing challenges in time series analysis, such as varied time formats, pivoted data structures, and handling large data volumes. This new interface makes it easier for users to unlock the potential of their temporal data. Recent topics on HANA machine learning:Comprehensive Guide to MLTrack in SAP HANA Cloud: End-to-End Machine Learning Experiment TrackingNew Machine Learning and AI features in SAP HANA Cloud 2025 Q2New Machine Learning and AI features in SAP HANA Cloud 2025 Q1   Read More Technology Blog Posts by SAP articles 

#SAP

#SAPTechnologyblog

You May Also Like

More From Author