Missing Value Imputation
Description	Missing value imputation is the attribution of values in place of missing values in a real-world dataset.
Why to use	Numerical Analysis – Data Preparation
When to use	When there are missing values in the data.	When not to use	On textual data. When there are no missing values.
Prerequisites	It should be used on numerical data.
Input		Output	In this example, the missing data is imputed by mean of the respective column values.
Statistical Methods used	Mean Median Min Max Remove Constant	Limitations	It is not very accurate. It does not account for the uncertainty in the imputations. It can introduce bias in the data.

Missing Value Imputation is located under Model Studio ( ) in Data Preparation, in the task pane on the left. Use the drag-and-drop method to use algorithm in the canvas. Click the algorithm to view and select different properties for analysis.

Refer to Properties of Missing Value Imputation.

There are many ways data can end up with missing values. For example

A 2-bedroom house would not include an answer for "How large is the third bedroom?"
Someone being surveyed may choose not to share their income.

Python libraries represent missing numbers as NaN which is short for "not a number".
Most libraries (including scikit-learn) will give you an error if you try to build a model using data with missing values. So, you will need to choose one of the strategies to impute missing values.
Missing value imputation is the attribution of values in place of missing values in a real-world dataset.
Many times, there are missing values in datasets. These datasets are incompatible for scikit estimators because these estimators assume that all values are meaningful numerical values. If we eliminate the rows in a dataset containing missing values, we may lose important and relevant data. Hence, missing value imputation fills the missing gaps by inferring the value from the known part of the data.
Missing value imputation can be univariate or multivariate. In univariate imputation, the missing value is replaced by a constant value or a statistical value like the mean or the median of the corresponding column. In multivariate imputation, each feature with missing value is modeled as a function of other features, and then this estimate is used for imputation.

Properties of Missing Value Imputation

The available properties of Missing Value Imputation are as shown in the figure given below.

The table given below describes different fields present on the properties of missing value imputation.

Field	Description	Remark
Task Name	It displays the name of the selected task.	You can click the text field to edit or modify the name of the task as required.
Continuous Variables	It allows you to select continuous variables to perform missing value imputation.	Multiple data fields can be selected. Only the numerical data fields selected for the reader are visible.
Allow Single Select (For Continuous Variables)	It allows you to impute individual missing values separately, for selected data fields.	Point to the data field and click the gear icon ( ). The available imputation methods are, Mean Median Min Max Remove Constant (If selected, enter the constant value)
Select Imputation Method (For Continuous Variables)	It allows you to select the imputation method from the drop-list to apply for the selected data fields.	The available imputation methods are, Mean Median Min Max Remove Constant (If selected, enter the constant value)
Categorical Variables	It allows you to select continuous variables to perform missing value imputation.	Multiple data fields can be selected. Only the categorical data fields selected for reader are visible.
Allow Single Select (For Categorical Variables)	It allows you to select the check box if you want to impute individual missing values separately, for selected data fields.	Point to the data field and click the gear icon The available imputation methods are, Mode Remove Constant (If selected, enter the constant value)
Select Imputation Method (For Categorical Variables)	It allows you to select the imputation method from the drop-list to apply for the selected data fields.	The available imputation methods are, Mode Remove Constant (If selected, enter the constant value)

Interpretation from Missing Value Imputation

The table given below describes the result for different imputation methods selected.

Imputation Method	Result	Remark
Mean	It replaces the missing values with the mean of the non-missing values within each column separately and independently from the others.	It only works on the column level. It can only be used with numerical data.
Median	It replaces the missing values with the median of the non-missing values within each column separately and independently from the others.	It only works on the column level. It can only be used with numeric data
Min	It replaces the missing values with the minimum value present in that column.	—
Max	It replaces the missing values with the maximum value present in that column.	—
Remove	It discards the rows that contain missing values.	It can be used for a small amount of missing data (20-30%) Removing a large amount of data may cause huge variations in the results. If there is a large amount of missing data, it is recommended to remove complete column (If you want to remove a column, do not select that particular column while analyzing)
Constant	It replaces the missing values with the constant value that you have entered.	Works well with categorical features It can introduce bias in the data
Mode (only for categorical variables)	It replaces the missing values with the mode of the values present in that column.	The distribution of data can become highly biased by mode imputation.

Table of Contents