Chi Square Goodness of Fit Test | |||
Description | Chi Square Goodness of Fit Test determines whether a categorical variable is likely to be derived from a specified distribution. This test is the same as Pearson’s Chi Square test. | ||
Why to use | To check whether a sample data derived from a population is a representative of the population. | ||
When to use | For categorical variables | When not to use | For continuous variables |
Prerequisites |
| ||
Input | One categorical variable | Output |
|
Statistical Methods used |
| Limitations | It can be used only on categorical data. |
Chi Square Goodness of Fit Test is located under Model Studio ( ) in Hypothesis Test, in Statistical Analysis, in the left task pane. Use the drag-and-drop method to use the algorithm in the canvas. Click the algorithm to view and select different properties for analysis. Refer to Properties of Chi Square Goodness of Fit Test.
The Chi Square Goodness of Fit Test is a hypothesis test. It tests whether the selected categorical variable is likely to be derived from the specified distribution. A dataset consists of data points. You also have a hypothesis or an idea to imagine how these data points are distributed in the dataset. The Chi Square Goodness of Fit Test gives you a way to check whether the data points actually fit our idea or hypothesis. That is, the test checks whether the data points are really distributed the way you have imagined them to be.
Properties of Chi Square Goodness of Fit Test
The available properties of the Chi Square Goodness of Fit Test are as shown in the figure given below.
The table below describes the different fields present on the Properties pane of the Chi Square Goodness of Fit Test.
Field | Description | Remark | |
Task Name | It is the name of the task selected on the workbook canvas. | You can click the text field to edit or modify the name of the task as required. | |
Feature | It allows you to select the categorical variable for the test. | Only one categorical variable can be selected. | |
Advanced | Alpha | It allows you to set the level of significance. | The default value is 0.05. |
Node Configuration | It allows you to select the instance of the AWS server to provide control on the execution of a task in a workbook or workflow. | For more details, refer to Worker Node Configuration. |
Example of Chi Square Goodness of Fit Test
Consider a HR dataset containing features like Age, BusinessTravel, Daily Rate, Department, DistanceFromHome, Education, and so on. A snippet of the input data is shown in the figure given below.
The BusinessTravel feature is selected as the categorical variable for studying the Chi Square Goodness of Fit Test.
The part of the Result page containing charts for the Chi Square Goodness of Fit Test is displayed below.
On this part of the Result Page,
- Chart of Contribution to the Chi Square value by Category shows Combined Values depicting the contribution of each BusinessTravel frequency to the calculated Chi Square value.
- Chart of Observed and expected Values gives a comparative idea of the contribution of each BusinessTravel frequency to the calculated Chi Square value.
On this part of the Result Page,
- Chart of Contribution to the Chi Square value by Category shows Combined Values depicting the contribution of each BusinessTravel frequency to the calculated Chi Square value.
- Chart of Observed and expected Values gives a comparative idea of the contribution of each BusinessTravel frequency to the calculated Chi Square value.
- Null Hypothesis assumes that there is no difference between observed and expected values.
- Alternative Hypothesis assumes that there is significant difference between observed values and expected values.
- Computation table for Chi Square gives the Observed Frequency (O) and Expected Frequency (E) of the BusinessTravel feature in the categories, Travel_Rarely, Travel_Frequently, and Non-Travel. It also shows the values for (O-E), (O-E)2, (O-E)2 /E.
- The Result table for Chi Square gives the Critical Value (952.6082), Calculated Value (5.9915) for Chi Square. It also gives the p value (0) and alpha (0.05).
You observe that the p value is less than alpha. Thus, the Interpretation states that there is not enough evidence available to accept the null hypothesis. Thus, values are not coming from a normal distribution. This is because, there is a significant amount of difference between the observed values and expected values.
Table of Contents