When you build a Forecast model, you would want to test its accuracy before starting using and putting into production. When you get a baseline forecast, you would want to make sure that the curve equation calculated is reflecting fairly the reality of your business. In order to do so, one would need to run multpile back-tests and perform multiple time horizons and at multple cut-off points. If this sounds like Chinese or French to your hear (pretty, but hard to understand), here is a little bit more of theory.
Before jumping in, you should maybe start by reading our post on how to make a sales forecast.
How to control that your forecast is correct? Use a back test methodology!
When testing the accuracy of a forecast, you need to compare the result produced with the true actuals. Intuitive, right? Reality is however more complex than what it sounds in first place.
What is a back-test?
The concept to test the accuracy of a forecast is pretty simple. You would need to articifically stop the actuals at a determined date in the past, "saving" the rest of the Actuals available to compare them with the forecast's model equation to understand if the forecast curve calculated is inline with the reality. So doing everything in the past. This is a common practice in machine learning work: saving a piece of data and then compare the model performance with the true outcome of the model. You could actually write a whole thesis just on this back testing technique as engineers never run out of imagination when it comes to measure the performance of a model.
How to make a back-test?
Here are the steps on how to run a forecast back test accuracy performance:
- You have all the actuals let's say until 30th June. These are the data you will use to build your forecast's curve
- You will build a forecast curve using the actuals available but artificially stop them on 31rd March - not using the whole dataset of Actuals available.
- You generate the forecast's curve with this limited dataset and predict the numbers
- You then compare the forecasted data points with the real actuals that you saved for the period going from 1st April and 30th June .
By doing so, you can have a fair idea of how relevant and accurate the model you just built for the forecast is and understand if the model you've just built is indeed capturing the reality of the business. The concept of back-testing a forecast curve is in fact pretty easy to understand.
Upload a CSV and start generating sales forecasts at scale with time series analysis.
Do multiple back-tests
However doing this once is not enough, you should repeat it over and over and get multiple "cut" points. The above process described is a good quick check, but if you require extra precision on your forecast, and perform a thorough forecast model back-testing, you would need to repeat this over and over with different point in times and multiple lengths (forecast horizons).
You could have a cut in december, another in October and so on. As you can imagine, this can however be time-consuming and challenging.
Back-testing multiple horizons and multiple cut-off helps for forecast accuracy
What are horizons and cut-offs - and why these concepts are important to check the accuracy of a forecast.
Ideally, you would want to test the accuracy of the forecast at multiple moment in time. The example described above can easily be performed manually as a one shot. But you would want to litterally test the robustness of the forecast curve equation at multiple moment of the year and for different time frames. Imagine now that you have 6 years of historical data (Actuals) to feed the forecasting model.
You would want to perform a forecast model back-testing...
- ... by using only the first 3 years of data to build the model (1095 days = this is called initial)...
- ... then, after these 3 initial years, stop the actuals every 6 months (180 days) for the remaining 3 years (180 days = this is called a period between each cut-off date)
- ... and run the forecast for a period of 365 days (365 days = the horizon)
In total, with this set-up, you would have 5 cut-offs dates (at 3 year, 3.5, 4, 4.5 and 5), each of them testing for a period of 365 days.
Doing this backtesting manually becomes a tedious task. You absolutely need to spare this energy and brain power of yours doing it - and getting a way to let your computer doing the work. You also might lose attention and make some calculation errors - especially if you work on your forecast at the end of the day, when things get more quiet in the office and meetings less frequent.
Introducing the MAPE score for forecast accuracy check
The Mean Absolute Percentage Error allows to check the delta in percentage between actuals and forecast. MAPE score is the average of the sum of the relative variations observed between Actuals and Forecast points. At least, this is for the theory. In the end of the back test process, you would have a MAPE score, for each of the 5 back-tests on a window frame for D+1, D+2, D+3... D+365. By doing so, you would be able to understand how accurate the forecast built is, on different time windows. Usually, the forecast is losing its accuracy as we go to the end of the period.
If you get a daily forecast then you don't want to check the monthly variation. If you do the monthly sum of actuals and compare it with the monthly sum of the prediction, this is great, but not really what you want. You want to check the precision of your forecast model at a daily level and pick up for each day the model performance versus the reality. At a monthly aggregated leve, you might end up with +1% or -1% - which is good! However this +/- 1% variation, once broken down at a daily level can reveal strong variations at a daily level - which we want to avoid as much as possible!
Prophet forecast comes with a back-testing module at heart. And it's great.
Prophet, Facebook's time serie forecasting algorithm, is definitely great as it comes by default with a back-testing functionnality. This can be run seperatly, to validate the hyperparameter choices which have been made.
We are using Prophet under the hood to perform the forecasts within our forecasting tool, we decided however not to implement yet the diagnostic feature for the time being - maybe in a second release of DataInsightOut if we get our customers asking for it.