Chi Square Test for Independence

Description

Chi Square Test for Independence determines whether two categorical variables are related or independent.

Why to use

To test the independence or association between categorical variables.

When to use

When the dataset contains at least two categorical variables.

When not to use

On Continuous data

Prerequisites

The data should be categorical.

Input

Two categorical variables

Output

  • Chi Square Statistic
  • Observed Frequency Table and Expected Frequency Table for the selected categorical Variables

Statistical Methods used

Limitations

It can be used only on categorical data.

Chi Square Test for Independence is located under Model Studio ( ) in Hypothesis Test, in Statistical Analysis, in the left task pane. Use the drag-and-drop method to use the algorithm in the canvas. Click the algorithm to view and select different properties for analysis. Refer to Properties of Chi Square Test for Independence.

The Chi Square Test for Independence is a hypothesis test. It compares two categorical variables to check if they are related to each other or not. It uses a contingency table (cross table) for the analysis of data. In a cross table, the data is classified according to the two categorical variables. The categories for one variable appear in the rows, while the categories for another variable appear in columns. Each cell represents the total count of cases for a specific pair of categories.

Properties of Chi Square Test for Independence

The available properties of the Chi Square Test for Independence are as shown in the figure given below.

The table below describes the different fields present on the Properties pane of the Chi Square Test for Independence.

Field

Description

Remark

Task Name

It is the name of the task selected on the workbook canvas.

You can click the text field to edit or modify the name of the task as required.

Independent Variable 1

It allows you to select the first categorical variable for the independence test.

Only a Categorical variable can be selected.

Independent Variable 2

It allows you to select the second categorical variable for the independence test.

Only a Categorical variable can be selected.

Advanced

Alpha

It allows you to set the level of significance.

The default value is 0.05.

Node Configuration

It allows you to select the instance of the AWS server to provide control on the execution of a task in a workbook or workflow.

For more details, refer to Worker Node Configuration.  

Example of Chi Square Test for Independence

Consider a dataset of Credit Card balances of people of different gender, age, education, and so on. A snippet of input data is shown in the figure given below.

The selected values for properties of the Chi Square Test for Independence are given in the table below.

Property

Value

Independent Variable 1

Ethnicity

Independent Variable 2

Gender

Alpha

0.05

The Result page of the Chi Square Test for Independence is displayed in the figure below.

The Chi Square Statistics and Observed and Expected Frequency Tables are displayed on the Result page as seen in the above figure.

The Observed Frequency and Expected Frequency Tables are cross tables. They display the values for the selected categorical variables - Ethnicity and Gender.

In this case, the Chi Square Statistic (0.2735) is less than the p Value ((0.8722). This indicates that the selected categorical variables Ethnicity and Gender are associated with each other – a relationship exists between these two variables.

Table of Contents