Post Content
For More Information: https://sap.to/6059fI0YH
Description
Regression is one of the most widely used predictive modelling techniques, applied to predict values of continuous numerical attributes or figures, like a price, a cost value, etc. Various regression models for different use cases and different strengths and popularity have evolved. Linear regression, for example, helps to uncover linear relationships, i.e., straight-line relationships, between input (independent) and output (dependent) numerical variables, making it a popular technique for predictive and statistical modeling. Tree-based algorithm implementations like Random Forest or Gradient Boosting Regression have become trending implementations, and proved to create robust models.
Expected Outcome
The expected outcome is a model that predicts the value of a dependent variable based on one or more independent variables. In the case of linear regression, this model is represented by a linear equation that minimizes the difference between predicted and actual values (ground truth).
Scenarios
Regression can be utilized in various scenarios, such as predicting car prices based on model characteristics and market trends or predicting house prices using a house’s living area (size). For instance, in the figure shown below, variable ‘x’ represents the set of values of houses’ living areas, whereas variable ‘y’ represents the prices of the houses.
High-level Reference Architecture
The continuous values of variable ‘y’ are predicted by ‘h’: the function that maps the values of ‘x’ to ‘y’. For simplicity, the figure only illustrates one data attribute of the houses, which is the living area. In this case, the predictor variable ‘x’ is continuous.
Similar to the previous figure, the following plot shows the input variable (houses’ living areas) on the x-axis and the output variable (houses’ prices) on the y-axis.
High-level Reference Architecture
The goal is to build a model that takes the living area of a house as input and predicts the house price. The data points in the plot represent observations from the dataset. By fitting a line to these data points, the model is created. The equation of this line is Y = m x + c, where ‘m’ is the slope of the line and ‘c’ is the y-intercept, i.e., the point where the line crosses the y-axis.
Benefits
Simplicity and Interpretability: it is straightforward to understand and explain, making it accessible for both technical and non-technical audiences.
Computational Efficiency: it is computationally inexpensive, making it suitable for large datasets and quick analyses.
Prediction and Trend Identification: it can be used to predict future outcomes or identify trends and patterns in data.
Building Block for Complex Models: it can serve as a foundation for more sophisticated models. It can also serve as a baseline to compare against more complex models / solutions.
Data Preparation: It requires minimal data preparation and can handle missing data, simplifying the analysis process.
Quantifying Relationships: it allows for the quantification of the strength, and direction of relationships between variables.
Provides Prediction Intervals: it can generate prediction intervals, offering a range within which future values are likely to fall. Read More SAP Developers
#SAP