Covariance

Description

  • Covariance is a statistical measure of the variability of two random variables with each other.
  • Covariance between two variables hints towards a linear relationship between them.

Why to use

To determine the relationship between two variables.

When to use

For numerical variables

When not to use

For textual data

Prerequisites

The data should not contain any missing values.

Input

Numerical variable having any positive or negative value.

Output

  • Scatter Plot
  • Heat Map

Statistical Methods used

Covariance Score

Limitations

  • Covariance gives the directional relationship between the variables. However, the magnitude of covariance (covariance score) is not very informative.
  • The variable variance is largely affected by the presence of even a small number of outliers in the data. This may lead to misleading statistics and interpretations.

Covariance is located under Model Studio (  ) under Statistical Analysis, in the Correlation and Covariance dropdown, in the left task pane. Use the drag-and-drop method to use the algorithm in the canvas. Click the algorithm to view and select different properties for analysis.

Refer to Properties of Covariance.

Covariance indicates a relationship between two variables when there is a change in their values. In case of positive covariance, an increase in one variable increases the other variable. Thus, both the variables move in the same direction. The positive covariance is denoted by a positive number.
When there is a negative covariance, an increase in one variable decreases the other variable. Thus, both the variables move in opposite directions. The negative covariance is denoted by a negative number.
The values of covariance between two random variables can lie between positive infinity and negative infinity (+∞ to -∞) limits.

Properties of Covariance

The available properties of the Covariance are as shown in the figure given below.

The table below describes the different fields present on the Properties pane of the Covariance.

Field

Description

Remark

Task Name

It is the name of the task selected on the workbook canvas.

You can click the text field to edit or modify the name of the task as required.

Input Column

It allows you to select the variable to be selected as the input attribute.

  • You can select any numerical type of variable.
  • If the data contains missing values, impute the missing values before using Covariance.

Advanced

Node Configuration

It allows you to select the instance of the AWS server to provide control on the execution of a task in a workbook or workflow.

For more details, refer to Worker Node Configuration.

Example of Covariance

Consider a dataset containing the values for petal and sepal widths and lengths of three different species of plants. A snippet of input data is shown below.


We select the Petal Length, Petal Width, Sepal Length, and Sepal Width as Input Columns.
The Result page of Covariance is displayed below.


There are two plots on the Result page.

Scatter Plot:

  • It shows the variation of one variable with respect to another.
  • Select any two different variables for X-axis and Y-axis and click Show Chart to obtain the scatter plot for the selected pair of variables.
  • For example, in the figure above, you see the variation of sepal width with sepal length.
  • Each dot on the plot is a data point in the dataset.

Heat Map:

  • It shows the variance of the mean of each of the four variables (Petal Length, Petal Width, Sepal Length, and Sepal Width) with themselves as well as with each other.
  • Each cell lies at the intersection of two of the above variables.
  • The number in each cell is the Covariance Score corresponding to the two variables.
  • The darker the color of the cell, the higher is the Covariance Score between the two variables.
  • For example, the Covariance is maximum for Petal Length with itself, that is 2.1.
  • Similarly, you can see that the Covariance Score is 0.8 for the pair Petal Length and Petal Width
  • The Data page of Covariance is displayed below.

  • It shows the Covariance values corresponding to values from Column 2 and Column 3.
  • Columns 2 and 3 contain the mean values of the four data columns, that is, Petal Length, Petal Width, Sepal Length, and Sepal Width.
  • The same Covariance Scores can be seen on the Heat Map on the Results page.

Table of Contents