With the advancement of technology and the increase in data processing capacity, it has become possible to apply advanced statistical models for forecasting sales quickly and practically. Sophisticated models that require analysis of many variables at the same time, optimization of parameters and multivariate regressions are used by data scientists around the world seeking to improve the accuracy of forecasts. There are a huge variety of libraries in open source programming languages (open source) aimed at applying these models and making predictions.
Just do a simple search on the models and thousands of references and materials on the subject will appear. In this scenario, it seems to be complex to define which models should be used and which parameters should be defined. When should Winter's method damping be performed? Does an autoregressive model make sense for the horizon and granularity of data that the series has?
However, before reaching this stage, as important as the selection of models is the preparation of the database.
Ever heard the phrase “you are what you eat”? In general, it is related to the idea that we should have a balanced and healthy diet. If we do not provide our organism with the necessary nutrients for its proper functioning, it will begin to present problems and will perform below the desired level. This concept is exactly the same as the one you should have within the demand forecasting process.
If your database does not have the most up-to-date information, has missing data or contains discrepant information, you can have the most assertive forecasting model in the world and with the highest processing power possible, but it will probably have a larger forecasting error. than one would like (after all, he was “fed” with a problematic base).
The data preparation and adjustment step is critical to a successful sales forecasting process. In this stage, the information is organized (level of granularity), the adjustment of the baseline and the definition of the planning horizon.
Information Organization
In the information organization stage, the level of granularity of the series to be analyzed must be defined. The definition of the ideal level of granularity should be based on the balance of the tripod accuracy, usefulness and effort. At this point, the discussion of the approach top-down ou bottom-up (discussed in previous posts) and which one will best bring results to my process.
Baseline adjustment
In the adjustment step of baseline, one should evaluate: existence of outliers, identify unique events that occurred in the series and the business actions that impacted sales. There are several statistical techniques for identifying and treating outliers (points far removed from the normal behavior of the series), as boxplot and graphical analysis.
Unique events such as a natural disaster or a strike must be identified and treated as exceptions, that is, points that occurred once or a few times and will not happen again. In this way, it is avoided that the model “learns” a non-regular pattern of the series and tries to reproduce it in the future.
The business actions that can impact are major promotions, launches or marketing campaigns that can bring temporary and artificial results to the standard behavior of the series. It is imperative that these actions are documented so that the forecasting team can make the necessary adjustments.
Definition of the Planning Horizon
At this stage, the data is not processed per se, but the horizon that the forecast will have is defined and what data history must be available for this purpose. The planning horizon depends on the company's decision cycle. If you want to make a purchase decision for raw material that takes 6 months to arrive, adjusting the data to a forecast for the next 3 months has little value. The horizon to be used for forecasting demand must be defined a priori, as this can impact model selection in addition to being directly related to the usefulness of the forecasting process.
The stage of adjusting the base and preparing the data is essential so that the adopted model receives the “nutrients” it needs to perform at its best. A history that is faithful to reality will allow your model to identify the correct patterns and make the most assertive projections possible. In today's scenario of wide availability of information, ensuring the quality of the data that will be used becomes critical.