This posting explains how to conduct Chi-square test of goodness of fit using Excel and SPSS. If you want a detailed explanation about the test itself, then, click this link Chi-square test of goodness of fit.
Chi-Square formula and brief overview
Here’s the formula for the Chi-Square (χ2):
Here’s a simpler version:
where O is the observed frequency of each category and E is the expected frequency of the category under the specific hypothesis about the population. In this posting, let’s assume the hypothesis that we are testing is that there are equal numbers of females and males in the population (and we collected gender information from a sample of 2,000 people).
As shown in the formula, the core components of Chi-Square (χ2) calculation are O and E.
Calculating the chi-square value using Excel
Step 1: O (Observed Frequency) Input
If you want to calculate the Chi-Square (χ2) value using Excel, you would first put the O as shown below.
For example, if there were 860 females and 1140 males in your sample, then, the frequency table would look like this:
Such an actual frequency from the sample is called the observed frequency (O).
Note, the full Excel screenshot for chi-square calculation is shown in Step 6 (see below). Until we get there, I will show only the relevant part of the Excel screen across the steps.
Step 2: E (Expected Frequency) Calculation
As the χ2 value indicates the degree of a discrepancy between the observed frequency of your sample and the expected frequency under an assumption about the population, the χ2 calculation naturally involves “observed frequency minus expected frequency (O – E),” which is the numerator of the formula.
This, of course, requires the creation of the expected frequency (E). That is, assuming a certain hypothesis about the population (e.g., equal gender distribution), we need to figure out the most expected frequency considering your sample size. In our example, since the sample size is 2,000, the most expected frequency under the equal-gender-distribution hypothesis is 1,000 females and 1,000 males. Such a most expected frequency under the hypothesis is called the “Expected Frequency (E)” In our example, assuming the equal-gender distribution in the population, the expected frequency table would be this:
Step 3: O – E
Calculating the difference between O and E (O – E) is the third step of χ2 calculation.
Step 4: (O − E)2
It is tempting to add these differences up or calculate the average of these differences across categories, as χ2 is an “overall” indicator of the discrepancy between the actual frequency distribution in your sample and the most expected frequency across categories. However, the sum of the differences (O – E) would always give you 0. This is because one category’s O – E would be positive and the other category’s O – E would be negative with the exact same magnitude (similar to the sum-of-zero issue in the standard deviation calculation).
To avoid the sum-of-zero issue, we square these differences for each category (as we do the same in the standard deviation calculation): (O−E)2.
Step 5: (O−E)2 / E
Then, we divide the squared difference scores of each category (O−E)2 by the category’s most expected value (E):
In our example, the numbers are the same for both female and male categories: 19600 / 1000. If you calculate the χ2 value using Excel, then, the relevant cells for the Female category in this step would be J4/D4 as shown below:
Step 6: ∑{(O−E)2 / E}
Finally, to create an overall measure of discrepancy between observed frequency and the expected frequency across categories, you sum up the {(O − E) 2 / E} across the categories. Then, the sum itself is the value of χ2.
That is, the formula of χ2 is:
SPSS Instructions: How to run the chi-square (χ2) test of goodness of fit
SPSS also wants you to provide the observed frequency first. But that is pretty much it. SPSS will do the Steps 2 ~ 6 above all by itself.
Step 1: O (Observed Frequency) Setup
Using the same example, if there were 860 females and 1140 males in your sample, then, the frequency table would eventually look like this:
Here are more detailed steps. First, in the first column (in data view), you would want to provide two numbers as below:
Then, in the Variable View (click the lower-left tab labeled Variable view), give an appropriate variable name (e.g., Gender) and define the measure as Nominal as the Gender variable is a nominal variable (rather than ordinal, interval, or ratio scale). Note, to define the measure, you would need to move the cursor to the first cell of the Measure column, and hover the cursor over the empty area on the right side of the cell. If you click while your cursor is over there, then, you can click the drop-down menu, which was hidden.
Then, move the cursor to the first cell of the Values column, and hover the cursor over the empty area on the right side of the cell. If you click while your cursor is over there, then, you can click the icon “…” which was hidden.
Then, you would see a pop-up window with which you can assign labels to the values of 1s and 2s that you have in the first column. For example, if your 1 means females and 2 means males, then, you can put “1” for the “Value:” and “Female” for “Label:”
Make sure to click Add after you define each value’s label.
Similarly, you can put “2” for the “Value:” and “Male” for “Label:” as shown below.
If you go to the Data View (click the lower-left tab labeled “Data View”) and click the “Value Labels” icon (see below),
then, you would see either the numeric value (i.e., 1 and 2) or their labels (i.e., female and male) of the variable (Gender).
Now you created the value column of the observed frequency table. Now, you need to create the frequency column that shows the observed frequency (O) of each category (i.e., female and male). See below:
You can define the frequency column’s name and measure in the Variable view, as shown below:
As a final step of the Observed-Frequency Input, each value in the first column should be weighted by the corresponding frequency in the second column. Specifically, Female should be weighted by 860 and Male should be weighted by 1140. YOu can do this weighting process using the last menu under the “Data” menu in the menu ribbon (see below).
If you select the Weight Cases menu, you would see the following pop-up window.
Click “Weight cases by,” select the “Frequency” variable from the left box, and click the right-arrow icon to move the Frequency variable from the left box to the “Frequency Variable” on the right.
If you hit OK, although nothing visible would happen, now the female and male values were weighted by their corresponding frequency in the frequency column (i.e., 860 and 1140, respectively). In other words, although not visible, there are 860 females and 1140 males in the first column.
This is how you can provide the observed frequency (O) in SPSS.
Step 2: Run Chi-Square Test of Goodness of Fit
Here is the path for the chi-square (χ2) test of goodness of fit.
Analyze – Nonparametric Test – Legacy Dialogs – Chi-Square
If you click the chi-square menu, then, you would see a pop-up window as shown below. You need to move the Gender variable from the left to the “Test Variable List” box.
Note, the default of the “Expected Values” box is “All categories equal.” That is, the default hypothesis being tested assumes that there are equal numbers of females and males in the population. Based on this assumption and the given sample size (N = 2, 000), SPSS calculates the expected frequency (i.e., 1,000 female and 1,000 male).
If you hit OK, then, the following outcome would show. Note, the Expected N shows 1,000 and 1,000 for female and male, respectively
Note the Observed N, Expected N, and Residual in the outcome corresponds to O, E, and O – E, respectively.
Here’s the Excel-based chi-square calculation, again. Both Excel and SPSS show the exact same chi-square (χ2) value of 39.2.
If you want to learn about how to report the results of the Chi-square (χ2) test, please refer to this post.
Hope this helped.