Moving Average in Forecasting

Description

  • The Moving Average is also known as Naïve Forecasting or moving/rolling mean.
  • It is an indicator that creates a series of averages of several subsets of a complete dataset

Why to use

The Moving Average is used with time-series for forecasting.

When to use

To analyze trends in linear or non-linear time-series data

When not to use

  • When the data is not time-series based
  • On textual and categorical data

Prerequisites

A time-series data should not contain null or missing values.

Input

Any dataset that contains time-series data

Output

  • Root mean Square Error (RMSE)
  • Baseline and Prediction Plot
  • Predicted Values of the selected Variable

Statistical Methods Used

  • Average
  • Root Mean Square Error

Limitations

Cannot identify the time series components

Moving Average is located under Forecasting () in Modeling, on the left task pane. Use the drag-and-drop method to use the algorithm on the canvas. Click the algorithm to view and select different properties for analysis.

Refer to the Properties of Moving Average.

Consider a time-series data containing the following annual sales figures. We calculate the Moving Average over three years, for years 2015-2016-2017, 2016-2017-2018, and 2017-2018-2019. These values are given in the table below. 

Year

Sales (In Millions)

Moving Average (Three Year Average)

2015

5.0

NA

2016

5.4

NA

2017

5.7

(5.0 + 5.4 + 5.7) / 3 = 5.366

2018

6.1

(5.4 + 5.7 + 6.1) / 3 = 5.733

2019

6.4

(5.7 + 6.1 + 6.4) / 3= 6.066

Properties of Moving Average

The available properties of Moving Average are as shown in the figure below.

The table below describes the different fields present on the properties of Moving Average.

Field

Description

Remark

Task Name

It is the name of the task selected on the workbook canvas.

  • You can click the text field to edit or modify the name of the task, as required.
  • Space between words is not allowed in the Task Name.

Time ID Variable

It allows you to select the time variable.

The dataset should contain at least one time variable.

Target Variable

It allows you to select the variable for performing the moving Average.

The variable selected can be discrete or continuous.

Group By

It allows you to select the function for grouping identical data.

  • Identical values of a column variable in different rows are grouped.
  • Usually, the variable selected is categorical.
  • Selecting Group By is optional

Advanced

Re-train

It allows you to select whether you want to re-train the moving average model.

  • It has two options, Yes and No.
  • By default, the re-train option is Yes.

Interval

It allows you to select the interval on which you want to calculate the Moving Average.

  • The available options are:
    • Day
    • Week
    • Month
    • Quarter
    • Year
  • By default, the interval is set to Month.

Number of Periods for Forecasting

It allows you to select a specific number of periods you want to forecast based on the moving average results.

  • By default, the number of periods selected is one (1).
  • You can select any integral number of periods as required.

Confidence Level (%)

It allows you to select the confidence level with which we predict the results.

  • By default, the confidence level is set at 95%.
  • It means that if the Moving Average is calculated multiple times, the results match the actual results from the dataset 95 percent of the time.
  • The difference in confidence level from 100% is equal to alpha (α).
  • Thus, a confidence level of 95% means an alpha (α) of 0.05.

Window Size

It allows you to select the number of data points you want to select for calculating the Average.

  • By default, the window size is two (2). It is the minimum number to be selected for calculating the Average.
  • The window size should be an integer (odd or even).
  • For example, if the window size selected is three (3), The Moving Average is calculated for the following data points: 1st, 2nd, and 3rd, then 2nd, 3rd, and 4th, followed by 3rd, 4th, and 5th so on.
  • In this case, the Moving Average for the first three data points will be the predicted value for the fourth data point, the Moving Average for 2nd, 3rd, and 4th data points will be the predicted value for 5th data point, and so on.
  • If the dataset contains a large number of data points, and you select a large window size, the model's accuracy is also high. Thus, an increase in window size increases the accuracy of predicted values.
  • Thus, using a large window size (and splitting the data set into train and test) to calculate the Moving Average and accurately predict the values is recommended.

Node Configuration


It allows you to select the instance of the Amazon Web Services (AWS) server to provide control on the execution of a task in a workbook or workflow.


For more details, refer to the Worker Node Configuration.

Example of Moving Average

Consider a FemaleBirthData dataset with 365 records. It contains columns for Date, Number of Daily Births, and the corresponding Quarters. A snippet of the input data is shown in the figure given below.

We apply Moving Average on the input data. The selected values for Moving Average are given below.

Property

Value

Time ID Variable

Date

Target Variable

Births

Group By

Quarter

Retrain

Yes

Interval

Day

Number of Periods for Forecasting

5

Confidence Level (%)

95

Window Size

4

On the Data pane, you see the predicted values for the corresponding data points in the output. As you can see,

  • Predicted values for the first three data points are 'NaN' since the selected window size is four (4).
  • The Moving Average is calculated for subsequent subsets of four data points each. The resulting average is the predicted value for the fourth data point.
  • The average of 35, 32, 30, and 31 is 32, the average of 32, 30, 31, and 44 is 34, and so on.
  • Hence, the predicted number of births on 1959-01-04 is 32, against actual births (31). Also, the predicted number of births on 1959-01-05 is 34, as against forty-four (44) actual births.

Further, the Result page displays

  • RMSE for actual and predicted values based on the calculated moving average
  • The baseline and prediction plot for births, where the red curve indicates the variation in predicted births and the blue curve indicates the variation in actual births for the Quarter Q1.

Similarly, you can change the Quarter from the Select Group field and obtain the corresponding plots.

Table of Contents