Time series analysis is fundamental in industries ranging from retail to finance, helping businesses forecast trends, predict anomalies, and optimize operations. Traditional approaches, however, often require complex preprocessing, data conversion, and algorithm selection, posing challenges for less technical users.
To address these issues, SAP HANA Predictive Analysis Library (PAL) has introduced a unified interface for time series algorithms. Following the successful implementation of its unified classification and regression interfaces, this update aims to make time series analysis more efficient and user-friendly.
In this blog post, we explore the latest features of this unified interface and showcase an example to illustrate its usage.
Key Highlights
Let’s dive into new interface’s key features in detail:
Unified Workflow
The unified interface streamlines the management of PAL algorithms by providing a standardized structure for invoking them. This simplifies parameter handling and data preparation for individual algorithms, enhancing efficiency and ease of use. Supported algorithms include Additive Model Time Series Analysis (AMTSA), Auto Regressive Integrated Moving Average (ARIMA), Bayesian Structural Time Series (BSTS), and Exponential Smoothing (SMOOTH).
Automatic Timestamp Conversion
The datasets of different time series analysis tasks can have diverse time formats, therefore automatic timestamp conversion is introduced in new unified interface. This feature automatically detects and converts between integer timepoints and timestamp types. To convert timepoints to timestamps, users must define START_POINT and INTERVAL. INTERVAL represents the spacing between timestamps, measured in the smallest unit of the target type (TARGET_TYPE). For instance, if the target type is DAYDATE and a weekly interval is desired, the INTERVAL value would be set to 7. Conversely, converting timestamps to timepoints is automated, with the system generating consecutive integers based on input timestamps. However, the input timestamps should be evenly spaced for this conversion to function effectively.
Pivoted Input Data Format Support
Traditionally, additional steps are required to transform the pivoted data into a usable format. To simplify this data preparation process, the new unified interface directly supports pivoted input data formats. This feature is particularly beneficial for complex, multidimensional time series data. The structure of input data is defined in the metadata table, as illustrated below.
CREATE COLUMN TABLE PAL_META_DATA_TBL (
“VARIABLE_NAME” NVARCHAR (50),
“VARIABLE_TYPE” NVARCHAR (50)
);
INSERT INTO PAL_META_DATA_TBL VALUES (‘TIMESTAMP’, ‘CONTINUOUS’);
INSERT INTO PAL_META_DATA_TBL VALUES (‘Y’, ‘TARGET’);
Massive Mode Capability
When dealing with vast datasets, users can leverage “massive mode” in unified interface. This mode enables algorithms to process multiple datasets simultaneously, with each dataset being executed independently and in parallel. To learn more about massive mode, visit the page on Massive Execution of PAL Functions.
Example
Let’s demonstrate the new interface with an example. Note that the code provided is purely for illustrative purposes and is not intended for production use.
The dataset is the Beijing PM2.5 data from the UCI Machine Learning Repository. It comprises hourly recordings of PM2.5 levels (airborne particles with aerodynamic diameters less than 2.5 μm) collected by the US Embassy in Beijing between January 1, 2010, and December 31, 2014. Additionally, meteorological data from Beijing Capital International Airport is included. The objective is to predict PM2.5 concentrations using various input features.
This dataset contains 43,824 rows and 11 columns. During preprocessing, the year, month, day, and hour columns were merged into a single ‘date’ column, and rows with missing values were addressed. The restructured dataset included the following 9 columns.
date: Timestamp of the record
pollution: PM2.5 concentration (ug/m^3)
dew: Dew Point
temp: Temperature
press: Pressure (hPa)
wnd_dir: Combined wind direction
wnd_spd: Cumulated wind speed (m/s)
snow: Cumulated hours of snow
rain: Cumulated hours of rain
To make it more manageable for demonstration purposes, we selected the first 1,000 instances. From this selection, we allocated 990 instances to the training set and reserved the final 10 for the testing set. Here’s a glimpse at the first five rows of the training set.
Once the data is loaded, the model can be trained, and results can be obtained using the following annotated SQL script.
–########## COLUMN TABLE CREATION ##########
CREATE COLUMN TABLE PAL_PARAMETER_TBL__0 (“PARAM_NAME” NVARCHAR(256), “INT_VALUE” INTEGER, “DOUBLE_VALUE” DOUBLE, “STRING_VALUE” NVARCHAR(1000));
CREATE COLUMN TABLE PAL_MODEL_TBL__0 (“INDEX” NVARCHAR (50), “CONTENT” NCLOB);
CREATE COLUMN TABLE PAL_STATISTICS_TBL__0 (“NAME” NVARCHAR (50), “VALUE_1” DOUBLE, “VALUE_2” DOUBLE, “VALUE_3” DOUBLE, “VALUE_4” DOUBLE, “VALUE_5” DOUBLE, “REASON” NVARCHAR (50));
CREATE COLUMN TABLE PAL_DECOMPOSE_TBL__0 (“TIME_STAMP” NVARCHAR (50), “TREND” DOUBLE, “SEASONAL” DOUBLE, “REGRESSION” DOUBLE, “RANDOM” DOUBLE);
CREATE COLUMN TABLE PAL_PLACE_HOLDER_TBL__0 (“OBJECT” NVARCHAR (10), “KEY” NVARCHAR (10), “VALUE” NVARCHAR (10));
CREATE COLUMN TABLE PAL_PREDICT_PARAMETER_TBL__0 (“PARAM_NAME” NVARCHAR(256), “INT_VALUE” INTEGER, “DOUBLE_VALUE” DOUBLE, “STRING_VALUE” NVARCHAR(1000));
CREATE COLUMN TABLE PAL_PREDICT_RESULT_TBL__0 (“TIME_STAMP” NVARCHAR (50), “FORECAST” DOUBLE, “VALUE_1” DOUBLE, “VALUE_2” DOUBLE, “VALUE_3” DOUBLE, “VALUE_4” DOUBLE, “VALUE_5” DOUBLE);
CREATE COLUMN TABLE PAL_PREDICT_DECOMPOSITION_TBL__0 (“TIME_STAMP” NVARCHAR (50), “VALUE_1” DOUBLE, “VALUE_2” NCLOB, “VALUE_3” NCLOB, “VALUE_4” NCLOB, “VALUE_5” NCLOB);
CREATE COLUMN TABLE PAL_PREDICT_PLACE_HOLDER_TBL__0 (“OBJECT” NVARCHAR (50), “KEY” NVARCHAR (50), “VALUE” NVARCHAR (50));
–########## TABLE INSERTS ##########
— The training data is stored in PAL_DATA_TBL__0, and the prediction data in PAL_PREDICT_DATA_TBL__0.
–########## PAL_PARAMETER_TBL__0 DATA INSERTION ##########
— Specify algorithm type, 0: AMTSA, 1: ARIMA, 2: BSTS, 3: SMOOTH
INSERT INTO PAL_PARAMETER_TBL__0 VALUES (‘FUNCTION’, 0, NULL, NULL);
–########## UNIFIED INTERFACE FOR TIME SERIES CALL ##########
DO BEGIN
lt_data = SELECT * FROM PAL_DATA_TBL__0;
lt_param = SELECT * FROM PAL_PARAMETER_TBL__0;
CALL _SYS_AFL.PAL_UNIFIED_TIMESERIES (:lt_data, :lt_param, lt_model, lt_stat, lt_decom, lt_ph);
lt_pdata = SELECT * FROM PAL_PREDICT_DATA_TBL__0;
lt_pparam = SELECT * FROM PAL_PREDICT_PARAMETER_TBL__0;
CALL _SYS_AFL.PAL_UNIFIED_TIMESERIES_PREDICT (:lt_pdata, :lt_model, :lt_pparam, lt_result, lt_decomp, lt_pph);
INSERT INTO PAL_PREDICT_RESULT_TBL__0 SELECT * FROM :lt_result;
INSERT INTO PAL_PREDICT_DECOMPOSITION_TBL__0 SELECT * FROM :lt_decomp;
END;
–########## SELECT * TABLES ##########
SELECT * FROM PAL_PREDICT_RESULT_TBL__0;
SELECT * FROM PAL_PREDICT_DECOMPOSITION_TBL__0;
–########## TABLES CLEANUP ##########
DROP TABLE PAL_PARAMETER_TBL__0;
DROP TABLE PAL_MODEL_TBL__0;
DROP TABLE PAL_STATISTICS_TBL__0;
DROP TABLE PAL_DECOMPOSE_TBL__0;
DROP TABLE PAL_PLACE_HOLDER_TBL__0;
DROP TABLE PAL_PREDICT_PARAMETER_TBL__0;
DROP TABLE PAL_PREDICT_RESULT_TBL__0;
DROP TABLE PAL_PREDICT_DECOMPOSITION_TBL__0;
DROP TABLE PAL_PREDICT_PLACE_HOLDER_TBL__0;
You can view the model, prediction results, and decomposition in the output tables. Below are illustrative snapshots of the output tables.
The composition of the resulting tables depends on the selected algorithm. For AMTSA, the result table includes the predicted values along with the lower and upper bounds of the uncertainty intervals. Additionally, the decomposition table provides various components, such as trend, seasonality, and others.
Summary
The unified interface is introduced to simplify the usage of PAL algorithms. This blog post highlights the key features addressing challenges in time series analysis, such as varied time formats, pivoted data structures, and handling large data volumes. This new interface makes it easier for users to unlock the potential of their temporal data.
Recent topics on HANA machine learning:
Comprehensive Guide to MLTrack in SAP HANA Cloud: End-to-End Machine Learning Experiment Tracking
New Machine Learning and AI features in SAP HANA Cloud 2025 Q2
New Machine Learning and AI features in SAP HANA Cloud 2025 Q1
Time series analysis is fundamental in industries ranging from retail to finance, helping businesses forecast trends, predict anomalies, and optimize operations. Traditional approaches, however, often require complex preprocessing, data conversion, and algorithm selection, posing challenges for less technical users.To address these issues, SAP HANA Predictive Analysis Library (PAL) has introduced a unified interface for time series algorithms. Following the successful implementation of its unified classification and regression interfaces, this update aims to make time series analysis more efficient and user-friendly.In this blog post, we explore the latest features of this unified interface and showcase an example to illustrate its usage.Key HighlightsLet’s dive into new interface’s key features in detail:Unified WorkflowThe unified interface streamlines the management of PAL algorithms by providing a standardized structure for invoking them. This simplifies parameter handling and data preparation for individual algorithms, enhancing efficiency and ease of use. Supported algorithms include Additive Model Time Series Analysis (AMTSA), Auto Regressive Integrated Moving Average (ARIMA), Bayesian Structural Time Series (BSTS), and Exponential Smoothing (SMOOTH).Automatic Timestamp ConversionThe datasets of different time series analysis tasks can have diverse time formats, therefore automatic timestamp conversion is introduced in new unified interface. This feature automatically detects and converts between integer timepoints and timestamp types. To convert timepoints to timestamps, users must define START_POINT and INTERVAL. INTERVAL represents the spacing between timestamps, measured in the smallest unit of the target type (TARGET_TYPE). For instance, if the target type is DAYDATE and a weekly interval is desired, the INTERVAL value would be set to 7. Conversely, converting timestamps to timepoints is automated, with the system generating consecutive integers based on input timestamps. However, the input timestamps should be evenly spaced for this conversion to function effectively.Pivoted Input Data Format SupportTraditionally, additional steps are required to transform the pivoted data into a usable format. To simplify this data preparation process, the new unified interface directly supports pivoted input data formats. This feature is particularly beneficial for complex, multidimensional time series data. The structure of input data is defined in the metadata table, as illustrated below.CREATE COLUMN TABLE PAL_META_DATA_TBL (
“VARIABLE_NAME” NVARCHAR (50),
“VARIABLE_TYPE” NVARCHAR (50)
);
INSERT INTO PAL_META_DATA_TBL VALUES (‘TIMESTAMP’, ‘CONTINUOUS’);
INSERT INTO PAL_META_DATA_TBL VALUES (‘Y’, ‘TARGET’);Massive Mode CapabilityWhen dealing with vast datasets, users can leverage “massive mode” in unified interface. This mode enables algorithms to process multiple datasets simultaneously, with each dataset being executed independently and in parallel. To learn more about massive mode, visit the page on Massive Execution of PAL Functions.ExampleLet’s demonstrate the new interface with an example. Note that the code provided is purely for illustrative purposes and is not intended for production use.The dataset is the Beijing PM2.5 data from the UCI Machine Learning Repository. It comprises hourly recordings of PM2.5 levels (airborne particles with aerodynamic diameters less than 2.5 μm) collected by the US Embassy in Beijing between January 1, 2010, and December 31, 2014. Additionally, meteorological data from Beijing Capital International Airport is included. The objective is to predict PM2.5 concentrations using various input features.This dataset contains 43,824 rows and 11 columns. During preprocessing, the year, month, day, and hour columns were merged into a single ‘date’ column, and rows with missing values were addressed. The restructured dataset included the following 9 columns.date: Timestamp of the recordpollution: PM2.5 concentration (ug/m^3)dew: Dew Pointtemp: Temperaturepress: Pressure (hPa)wnd_dir: Combined wind directionwnd_spd: Cumulated wind speed (m/s)snow: Cumulated hours of snowrain: Cumulated hours of rainTo make it more manageable for demonstration purposes, we selected the first 1,000 instances. From this selection, we allocated 990 instances to the training set and reserved the final 10 for the testing set. Here’s a glimpse at the first five rows of the training set.Once the data is loaded, the model can be trained, and results can be obtained using the following annotated SQL script.–########## COLUMN TABLE CREATION ##########
CREATE COLUMN TABLE PAL_PARAMETER_TBL__0 (“PARAM_NAME” NVARCHAR(256), “INT_VALUE” INTEGER, “DOUBLE_VALUE” DOUBLE, “STRING_VALUE” NVARCHAR(1000));
CREATE COLUMN TABLE PAL_MODEL_TBL__0 (“INDEX” NVARCHAR (50), “CONTENT” NCLOB);
CREATE COLUMN TABLE PAL_STATISTICS_TBL__0 (“NAME” NVARCHAR (50), “VALUE_1” DOUBLE, “VALUE_2” DOUBLE, “VALUE_3” DOUBLE, “VALUE_4” DOUBLE, “VALUE_5” DOUBLE, “REASON” NVARCHAR (50));
CREATE COLUMN TABLE PAL_DECOMPOSE_TBL__0 (“TIME_STAMP” NVARCHAR (50), “TREND” DOUBLE, “SEASONAL” DOUBLE, “REGRESSION” DOUBLE, “RANDOM” DOUBLE);
CREATE COLUMN TABLE PAL_PLACE_HOLDER_TBL__0 (“OBJECT” NVARCHAR (10), “KEY” NVARCHAR (10), “VALUE” NVARCHAR (10));
CREATE COLUMN TABLE PAL_PREDICT_PARAMETER_TBL__0 (“PARAM_NAME” NVARCHAR(256), “INT_VALUE” INTEGER, “DOUBLE_VALUE” DOUBLE, “STRING_VALUE” NVARCHAR(1000));
CREATE COLUMN TABLE PAL_PREDICT_RESULT_TBL__0 (“TIME_STAMP” NVARCHAR (50), “FORECAST” DOUBLE, “VALUE_1” DOUBLE, “VALUE_2” DOUBLE, “VALUE_3” DOUBLE, “VALUE_4” DOUBLE, “VALUE_5” DOUBLE);
CREATE COLUMN TABLE PAL_PREDICT_DECOMPOSITION_TBL__0 (“TIME_STAMP” NVARCHAR (50), “VALUE_1” DOUBLE, “VALUE_2” NCLOB, “VALUE_3” NCLOB, “VALUE_4” NCLOB, “VALUE_5” NCLOB);
CREATE COLUMN TABLE PAL_PREDICT_PLACE_HOLDER_TBL__0 (“OBJECT” NVARCHAR (50), “KEY” NVARCHAR (50), “VALUE” NVARCHAR (50));
–########## TABLE INSERTS ##########
— The training data is stored in PAL_DATA_TBL__0, and the prediction data in PAL_PREDICT_DATA_TBL__0.
–########## PAL_PARAMETER_TBL__0 DATA INSERTION ##########
— Specify algorithm type, 0: AMTSA, 1: ARIMA, 2: BSTS, 3: SMOOTH
INSERT INTO PAL_PARAMETER_TBL__0 VALUES (‘FUNCTION’, 0, NULL, NULL);
–########## UNIFIED INTERFACE FOR TIME SERIES CALL ##########
DO BEGIN
lt_data = SELECT * FROM PAL_DATA_TBL__0;
lt_param = SELECT * FROM PAL_PARAMETER_TBL__0;
CALL _SYS_AFL.PAL_UNIFIED_TIMESERIES (:lt_data, :lt_param, lt_model, lt_stat, lt_decom, lt_ph);
lt_pdata = SELECT * FROM PAL_PREDICT_DATA_TBL__0;
lt_pparam = SELECT * FROM PAL_PREDICT_PARAMETER_TBL__0;
CALL _SYS_AFL.PAL_UNIFIED_TIMESERIES_PREDICT (:lt_pdata, :lt_model, :lt_pparam, lt_result, lt_decomp, lt_pph);
INSERT INTO PAL_PREDICT_RESULT_TBL__0 SELECT * FROM :lt_result;
INSERT INTO PAL_PREDICT_DECOMPOSITION_TBL__0 SELECT * FROM :lt_decomp;
END;
–########## SELECT * TABLES ##########
SELECT * FROM PAL_PREDICT_RESULT_TBL__0;
SELECT * FROM PAL_PREDICT_DECOMPOSITION_TBL__0;
–########## TABLES CLEANUP ##########
DROP TABLE PAL_PARAMETER_TBL__0;
DROP TABLE PAL_MODEL_TBL__0;
DROP TABLE PAL_STATISTICS_TBL__0;
DROP TABLE PAL_DECOMPOSE_TBL__0;
DROP TABLE PAL_PLACE_HOLDER_TBL__0;
DROP TABLE PAL_PREDICT_PARAMETER_TBL__0;
DROP TABLE PAL_PREDICT_RESULT_TBL__0;
DROP TABLE PAL_PREDICT_DECOMPOSITION_TBL__0;
DROP TABLE PAL_PREDICT_PLACE_HOLDER_TBL__0;You can view the model, prediction results, and decomposition in the output tables. Below are illustrative snapshots of the output tables.The composition of the resulting tables depends on the selected algorithm. For AMTSA, the result table includes the predicted values along with the lower and upper bounds of the uncertainty intervals. Additionally, the decomposition table provides various components, such as trend, seasonality, and others.SummaryThe unified interface is introduced to simplify the usage of PAL algorithms. This blog post highlights the key features addressing challenges in time series analysis, such as varied time formats, pivoted data structures, and handling large data volumes. This new interface makes it easier for users to unlock the potential of their temporal data. Recent topics on HANA machine learning:Comprehensive Guide to MLTrack in SAP HANA Cloud: End-to-End Machine Learning Experiment TrackingNew Machine Learning and AI features in SAP HANA Cloud 2025 Q2New Machine Learning and AI features in SAP HANA Cloud 2025 Q1 Read More Technology Blog Posts by SAP articles
#SAP
#SAPTechnologyblog