Excel/SPSS How To: Chi-Square Test of Goodness of Fit

This posting explains how to conduct Chi-square test of goodness of fit using Excel and SPSS. If you want a detailed explanation about the test itself, then, click this link Chi-square test of goodness of fit.

Chi-Square formula and brief overview

Here’s the formula for the Chi-Square (χ2):

Formula of chi-square

Here’s a simpler version:

    \[\chi^2 = \sum\frac{(O - E)^2}{E}}\]

where O is the observed frequency of each category and E is the expected frequency of the category under the specific hypothesis about the population. In this posting, let’s assume the hypothesis that we are testing is that there are equal numbers of females and males in the population (and we collected gender information from a sample of 2,000 people).

As shown in the formula, the core components of Chi-Square (χ2) calculation are O and E.

Calculating the chi-square value using Excel

Step 1: O (Observed Frequency) Input

If you want to calculate the Chi-Square (χ2) value using Excel, you would first put the O as shown below.

For example, if there were 860 females and 1140 males in your sample, then, the frequency table would look like this:

Step 1: Observed frequency table (O).

Such an actual frequency from the sample is called the observed frequency (O).

Note, the full Excel screenshot for chi-square calculation is shown in Step 6 (see below). Until we get there, I will show only the relevant part of the Excel screen across the steps.

Step 2: E (Expected Frequency) Calculation

As the χ2 value indicates the degree of a discrepancy between the observed frequency of your sample and the expected frequency under an assumption about the population, the χ2 calculation naturally involves “observed frequency minus expected frequency (O – E),” which is the numerator of the formula.

This, of course, requires the creation of the expected frequency (E). That is, assuming a certain hypothesis about the population (e.g., equal gender distribution), we need to figure out the most expected frequency considering your sample size. In our example, since the sample size is 2,000, the most expected frequency under the equal-gender-distribution hypothesis is 1,000 females and 1,000 males. Such a most expected frequency under the hypothesis is called the “Expected Frequency (E)” In our example, assuming the equal-gender distribution in the population, the expected frequency table would be this:

Step 2: Expected frequency table (E).

Step 3: O – E

Calculating the difference between O and E (O – E) is the third step of χ2 calculation.

Step 3: O – E

Step 4: (O − E)2

It is tempting to add these differences up or calculate the average of these differences across categories, as χ2 is an “overall” indicator of the discrepancy between the actual frequency distribution in your sample and the most expected frequency across categories. However, the sum of the differences (O – E) would always give you 0. This is because one category’s O – E would be positive and the other category’s O – E would be negative with the exact same magnitude (similar to the sum-of-zero issue in the standard deviation calculation).

To avoid the sum-of-zero issue, we square these differences for each category (as we do the same in the standard deviation calculation): (O−E)2.  

Step 4: (O−E)2

Step 5: (O−E)2 / E

Then, we divide the squared difference scores of each category (O−E)2 by the category’s most expected value (E):

    \[(O - E)^2 \over E\]

In our example, the numbers are the same for both female and male categories: 19600 / 1000. If you calculate the χ2 value using Excel, then, the relevant cells for the Female category in this step would be J4/D4 as shown below:

Step 5: (O−E)2 / E

Step 6: ∑{(O−E)2 / E}

Finally, to create an overall measure of discrepancy between observed frequency and the expected frequency across categories, you sum up the {(O − E) 2 / E} across the categories. Then, the sum itself is the value of χ2.

That is, the formula of χ2 is:

    \[\chi^2 = \sum\frac{(O - E)^2}{E}}\]

Chi-square value of 39.2 calculated by adding up (O−E)2 / E for female and male (Cell M4 and N4).

SPSS Instructions: How to run the chi-square (χ2) test of goodness of fit

SPSS also wants you to provide the observed frequency first. But that is pretty much it. SPSS will do the Steps 2 ~ 6 above all by itself.

Step 1: O (Observed Frequency) Setup

Using the same example, if there were 860 females and 1140 males in your sample, then, the frequency table would eventually look like this:

Step 1: Observed frequency table (O).

Here are more detailed steps. First, in the first column (in data view), you would want to provide two numbers as below:

Step 1_1: Create a variable with two numeric values (1 & 2) for female and male.

Then, in the Variable View (click the lower-left tab labeled Variable view), give an appropriate variable name (e.g., Gender) and define the measure as Nominal as the Gender variable is a nominal variable (rather than ordinal, interval, or ratio scale). Note, to define the measure, you would need to move the cursor to the first cell of the Measure column, and hover the cursor over the empty area on the right side of the cell. If you click while your cursor is over there, then, you can click the drop-down menu, which was hidden.

Step 1_2: Refining the first variable (Gender) in the Variable view by providing the variable name and defining the measure (Nominal).

Then, move the cursor to the first cell of the Values column, and hover the cursor over the empty area on the right side of the cell. If you click while your cursor is over there, then, you can click the icon “…” which was hidden.

Step 1_3: Refining the first variable (Gender) in the Variable view using the Values menu.

Then, you would see a pop-up window with which you can assign labels to the values of 1s and 2s that you have in the first column. For example, if your 1 means females and 2 means males, then, you can put “1” for the “Value:” and “Female” for “Label:”

Step 1_4_1: Value label definition.

Make sure to click Add after you define each value’s label.

Similarly, you can put “2” for the “Value:” and “Male” for “Label:” as shown below.

Step 1_4_2: Value label definition.

If you go to the Data View (click the lower-left tab labeled “Data View”) and click the “Value Labels” icon (see below),

Value labels icon

then, you would see either the numeric value (i.e., 1 and 2) or their labels (i.e., female and male) of the variable (Gender).

Seeing the values and their labels using the “Value Labels” icon.

Now you created the value column of the observed frequency table. Now, you need to create the frequency column that shows the observed frequency (O) of each category (i.e., female and male). See below:

Step 1_5: Adding the frequency column to the Observed-Frequency Table (O).

You can define the frequency column’s name and measure in the Variable view, as shown below:

Step 1_6: Refining the frequency column by providing the variable name and measure in the Variable View.

As a final step of the Observed-Frequency Input, each value in the first column should be weighted by the corresponding frequency in the second column. Specifically, Female should be weighted by 860 and Male should be weighted by 1140. YOu can do this weighting process using the last menu under the “Data” menu in the menu ribbon (see below).

Step 1_7_1: Weighting cases (values) in the first column using the Weight Cases menu.

If you select the Weight Cases menu, you would see the following pop-up window.

Step 1_7_2: Weight Cases pop-up window.

Click “Weight cases by,” select the “Frequency” variable from the left box, and click the right-arrow icon to move the Frequency variable from the left box to the “Frequency Variable” on the right.

Step 1_7_3: Weight cases (female and male in the first column) by the Frequency variable (with the values of 860 and 1140), respsecitvely.

If you hit OK, although nothing visible would happen, now the female and male values were weighted by their corresponding frequency in the frequency column (i.e., 860 and 1140, respectively). In other words, although not visible, there are 860 females and 1140 males in the first column.

This is how you can provide the observed frequency (O) in SPSS.

Step 2: Run Chi-Square Test of Goodness of Fit

Here is the path for the chi-square (χ2) test of goodness of fit.

Analyze – Nonparametric Test – Legacy Dialogs – Chi-Square

Step 2_1: Chi-Square menu

If you click the chi-square menu, then, you would see a pop-up window as shown below. You need to move the Gender variable from the left to the “Test Variable List” box.

Step 2_2: Chi-Square test setup

Note, the default of the “Expected Values” box is “All categories equal.” That is, the default hypothesis being tested assumes that there are equal numbers of females and males in the population. Based on this assumption and the given sample size (N = 2, 000), SPSS calculates the expected frequency (i.e., 1,000 female and 1,000 male).

If you hit OK, then, the following outcome would show. Note, the Expected N shows 1,000 and 1,000 for female and male, respectively

Step 2_3: SPSS outcome of the Chi-square (χ2) test of goodness of fit.

Note the Observed N, Expected N, and Residual in the outcome corresponds to O, E, and O – E, respectively.

Here’s the Excel-based chi-square calculation, again. Both Excel and SPSS show the exact same chi-square (χ2) value of 39.2.

Both SPSS and Excel show the exact same chi-square (χ2) value of 39.2.

If you want to learn about how to report the results of the Chi-square (χ2) test, please refer to this post.

Hope this helped.

Leave a Comment