Moving Average in Forecasting
Description	The Moving Average is also known as Naïve Forecasting or moving/rolling mean. It is an indicator that creates a series of averages of several subsets of a complete dataset
Why to use	The Moving Average is used with time-series for forecasting.
When to use	To analyze trends in linear or non-linear time-series data	When not to use	When the data is not time-series based On textual and categorical data
Prerequisites	A time-series data should not contain null or missing values.
Input	Any dataset that contains time-series data	Output	Root mean Square Error (RMSE) Baseline and Prediction Plot Predicted Values of the selected Variable
Statistical Methods Used	Average Root Mean Square Error	Limitations	Cannot identify the time series components

Moving Average is located under Forecasting () in Modeling, on the left task pane. Use the drag-and-drop method to use the algorithm on the canvas. Click the algorithm to view and select different properties for analysis.

Refer to the Properties of Moving Average.

Consider a time-series data containing the following annual sales figures. We calculate the Moving Average over three years, for years 2015-2016-2017, 2016-2017-2018, and 2017-2018-2019. These values are given in the table below.

Year	Sales (In Millions)	Moving Average (Three Year Average)
2015	5.0	NA
2016	5.4	NA
2017	5.7	(5.0 + 5.4 + 5.7) / 3 = 5.366
2018	6.1	(5.4 + 5.7 + 6.1) / 3 = 5.733
2019	6.4	(5.7 + 6.1 + 6.4) / 3= 6.066

Properties of Moving Average

The available properties of Moving Average are as shown in the figure below.

The table below describes the different fields present on the properties of Moving Average.

Field		Description	Remark
Task Name		It is the name of the task selected on the workbook canvas.	You can click the text field to edit or modify the name of the task, as required. Space between words is not allowed in the Task Name.
Time ID Variable		It allows you to select the time variable.	The dataset should contain at least one time variable.
Target Variable		It allows you to select the variable for performing the moving Average.	The variable selected can be discrete or continuous.
Group By		It allows you to select the function for grouping identical data.	Identical values of a column variable in different rows are grouped. Usually, the variable selected is categorical. Selecting Group By is optional
Advanced	Re-train	It allows you to select whether you want to re-train the moving average model.	It has two options, Yes and No. By default, the re-train option is Yes.
	Interval	It allows you to select the interval on which you want to calculate the Moving Average.	The available options are: Day Week Month Quarter Year By default, the interval is set to Month.
	Number of Periods for Forecasting	It allows you to select a specific number of periods you want to forecast based on the moving average results.	By default, the number of periods selected is one (1). You can select any integral number of periods as required.
	Confidence Level (%)	It allows you to select the confidence level with which we predict the results.	By default, the confidence level is set at 95%. It means that if the Moving Average is calculated multiple times, the results match the actual results from the dataset 95 percent of the time. The difference in confidence level from 100% is equal to alpha (α). Thus, a confidence level of 95% means an alpha (α) of 0.05.
	Window Size	It allows you to select the number of data points you want to select for calculating the Average.	By default, the window size is two (2). It is the minimum number to be selected for calculating the Average. The window size should be an integer (odd or even). For example, if the window size selected is three (3), The Moving Average is calculated for the following data points: 1st, 2nd, and 3rd, then 2nd, 3rd, and 4th, followed by 3rd, 4th, and 5th so on. In this case, the Moving Average for the first three data points will be the predicted value for the fourth data point, the Moving Average for 2^nd, 3^rd, and 4^th data points will be the predicted value for 5^th data point, and so on. If the dataset contains a large number of data points, and you select a large window size, the model's accuracy is also high. Thus, an increase in window size increases the accuracy of predicted values. Thus, using a large window size (and splitting the data set into train and test) to calculate the Moving Average and accurately predict the values is recommended.
	Node Configuration	It allows you to select the instance of the Amazon Web Services (AWS) server to provide control on the execution of a task in a workbook or workflow.	For more details, refer to the Worker Node Configuration.

Example of Moving Average

Consider a FemaleBirthData dataset with 365 records. It contains columns for Date, Number of Daily Births, and the corresponding Quarters. A snippet of the input data is shown in the figure given below.

We apply Moving Average on the input data. The selected values for Moving Average are given below.

Property	Value
Time ID Variable	Date
Target Variable	Births
Group By	Quarter
Retrain	Yes
Interval	Day
Number of Periods for Forecasting	5
Confidence Level (%)	95
Window Size	4

On the Data pane, you see the predicted values for the corresponding data points in the output. As you can see,

Predicted values for the first three data points are 'NaN' since the selected window size is four (4).
The Moving Average is calculated for subsequent subsets of four data points each. The resulting average is the predicted value for the fourth data point.
The average of 35, 32, 30, and 31 is 32, the average of 32, 30, 31, and 44 is 34, and so on.
Hence, the predicted number of births on 1959-01-04 is 32, against actual births (31). Also, the predicted number of births on 1959-01-05 is 34, as against forty-four (44) actual births.

Further, the Result page displays

RMSE for actual and predicted values based on the calculated moving average
The baseline and prediction plot for births, where the red curve indicates the variation in predicted births and the blue curve indicates the variation in actual births for the Quarter Q1.

Similarly, you can change the Quarter from the Select Group field and obtain the corresponding plots.

Table of Contents