The different tests available in Time-series Data Preparation under Forecasting are given below.

  • Accumulation
  • Missing Value
  • Transformation
  • Differencing
Data Preparation

Description

  • The time-series data may contain missing values that need to be imputed. The time-series missing value imputation imputes these missing data points.
  • It is performed only on missing values. The other values present in the dataset remain unchanged.

Why to use

To impute missing values in the time-series data.

When to use

For analysis of time-series data without losing its variation.

When not to use

When data do not contain any missing values.

Prerequisites

The time interval for the data to be analyzed should be specified.

Input

Time-series data with fixed time intervals or time-series data.

Output

A complete time-series data for the specified time interval having no missing values.

Statistical Methods used

  • Mean
  • Median
  • Min
  • Max
  • Remove
  • Constant
  • Random
  • Forward Fill
  • Backward Fill
  • Interpolate

Limitations

  • It does not account for the uncertainty of the imputations.
  • It can introduce bias in the data.

Functions of the Missing Value Test

The table given below describes the functions of the Missing Value test.
 

Function

Description

Remark

Mean

It replaces the missing values with the mean of the non-missing values within each column separately and independently from the others.

  • It only works on the column level.
  • It can only be used with numerical data.

Median

It replaces the missing values with the median of the non-missing values within each column separately and independently from the others.

  • It only works on the column level.
  • It can only be used with numerical data.

Min

It replaces the missing values with the minimum value present in that column.

Max

It replaces the missing values with the maximum value present in that column.

Remove

It discards the rows that contain missing values.

  • It can be used for a small amount of missing data (20-30%)
  • Removing a large amount of data may cause considerable variations in the results.
  • If there is a large amount of missing data, it is recommended to remove the complete column (If you want to remove a column, do not select that column while analyzing).

Constant

It replaces the missing values with the constant value that you have entered.

  • It can only be used with numerical data.
  • It can introduce bias in the data.

Random

It replaces the missing values with random values from that column.

  • It can only be used with numerical data.

Forward Fill

It fills the missing value with the preceding value from the dataset.

For example, the number of people on Tuesday is missing in the time-series data. In this case, the Monday count becomes the Tuesday data.

Backward Fill

It fills the missing value with the succeeding value from the dataset.

For example, the number of people on Tuesday is missing in a time series data. In this case, the Wednesday count becomes the Tuesday data.

Interpolate

Using some pre-defined algorithms, it replaces the missing value by interpolating the existing values linearly in the dataset.

For example, the data points for twelve months are present in the time-series data, and the next value for the thirteenth month is missing. In this case, the twelve values are interpolated, and the value for the thirteenth month is calculated.

Example of Time-series Data Preparation

Consider an example of Female Birth Rate time-series data. Here, the variable Date is of interval type. The dataset containing some missing values (na) is shown in the figure below.




In the Properties pane, the values are selected as below.

Time ID Variable

Date

Target Variable

Births

Group By

Quarters

Interval

Week

Time Format

12/13/1947

Start Time

None

End Time

None

We apply Data Preparation on the above data. On the Result page, you can see Group as Q1.

(info)Notes:

  • You can also execute each test independently by selecting the corresponding check box, selecting the required values in the fields corresponding to the selected test, and then clicking Run Test.
  • When you select the check box for Select All Tests, all four tests, Accumulation, Missing Value, Transformation, and Differencing, are selected.
  • Trace displays the log of the selected tests when they are executed. It logs the number of times each test starts and ends, based on the option(s) selected in Group By.


When you click Run Selected Tests, the tests are performed based on the default function and values in the fields corresponding to each test. The result for each test is displayed in the figures given below.

Interpretation of Result of Missing Value Interpretation

Forward Fill is selected as the Function for plotting the Missing Value Imputation Plot. Thus, the missing data points in the selected Group (Q1) are filled from the previous values, and the corresponding graph is created.


In the above figure, the Missing Value Imputation plot displays the monthly number of Births in the time-series data. The Interpolated values of Births impute the missing values in quarter Q1.
In the Data tab, the table shows the forward-filled values. The values 32 and 29 from the previous data points are forward filled in place of the na values.

Table of Contents