Chi Square Test for Independence | ||||
Description | Chi Square Test for Independence determines whether two categorical variables are related or independent. | |||
Why to use | To test the independence or association between categorical variables. | |||
When to use | When the dataset contains at least two categorical variables. | When not to use | On Continuous data | |
Prerequisites | The data should be categorical. | |||
Input | Two categorical variables | Output |
| |
Statistical Methods used | — | Limitations | It can be used only on categorical data. |
Chi Square Test for Independence is located under Model Studio ( ) in Hypothesis Test, in Statistical Analysis, in the left task pane. Use the drag-and-drop method to use the algorithm in the canvas. Click the algorithm to view and select different properties for analysis. Refer to Properties of Chi Square Test for Independence.
The Chi Square Test for Independence is a hypothesis test. It compares two categorical variables to check if they are related to each other or not. It uses a contingency table (cross table) for the analysis of data. In a cross table, the data is classified according to the two categorical variables. The categories for one variable appear in the rows, while the categories for another variable appear in columns. Each cell represents the total count of cases for a specific pair of categories.
Properties of Chi Square Test for Independence
The available properties of the Chi Square Test for Independence are as shown in the figure given below.
The table below describes the different fields present on the Properties pane of the Chi Square Test for Independence.
Field | Description | Remark | |
Task Name | It is the name of the task selected on the workbook canvas. | You can click the text field to edit or modify the name of the task as required. | |
Independent Variable 1 | It allows you to select the first categorical variable for the independence test. | Only a Categorical variable can be selected. | |
Independent Variable 2 | It allows you to select the second categorical variable for the independence test. | Only a Categorical variable can be selected. | |
Advanced | Alpha | It allows you to set the level of significance. | The default value is 0.05. |
Node Configuration | It allows you to select the instance of the AWS server to provide control on the execution of a task in a workbook or workflow. | For more details, refer to Worker Node Configuration. |
Example of Chi Square Test for Independence
Consider a dataset of Credit Card balances of people of different gender, age, education, and so on. A snippet of input data is shown in the figure given below.
The selected values for properties of the Chi Square Test for Independence are given in the table below.
Property | Value |
Independent Variable 1 | Ethnicity |
Independent Variable 2 | Gender |
Alpha | 0.05 |
The Result page of the Chi Square Test for Independence is displayed in the figure below.
The Chi Square Statistics and Observed and Expected Frequency Tables are displayed on the Result page as seen in the above figure.
The Observed Frequency and Expected Frequency Tables are cross tables. They display the values for the selected categorical variables - Ethnicity and Gender.
In this case, the Chi Square Statistic (0.2735) is less than the p Value ((0.8722). This indicates that the selected categorical variables Ethnicity and Gender are associated with each other – a relationship exists between these two variables.
Table of Contents