A Short Intro
Many social science students, including psychology, hate statistics.
I was one of the statistics haters, too.
I couldn’t understand the contents from my undergraduate stat classes and how stats might be used in my life in the future.
However, when social science students do a job interview, one of their expected skills is the statistical-related skills.
For example, as a social-science-major employee, you may be given customer satisfaction data that include lots of these words:
- Highly Satisfied / Somewhat Satisfied / Somewhat Unsatisfied / Highly Unsatisfied
or lots of numbers like
- 4 / 3 / 2 / 1 that correspond to the four satisfaction levels, respectively.
[Star Q] What are we supposed to do first when we get data (words or numbers)? (the answer is right below)
Frequency Analysis
The first thing to do is to organize those words or numbers in order (alphabetical or ascending/descending) along with their frequency right next to the words or numbers so that your boss can see how many times each word or number occurred.
This process is called a frequency analysis and the table that shows all the possible values of a variable (if you want to remind yourself of the basic concepts of these terms, then, click this link) in the first column and each value’s frequency in the next column is called a frequency table.
Example of Frequency Analysis
- Sample Data: The following raw data show 12 randomly-selected students’ grades from my 2012 statistics course. Although I’m using only 12 students’ grades, there were 148 students in this class. That is, this table was supposed to be very long, but I cut it short.
Student id # | The student’s grade |
1 | C |
2 | D |
3 | B |
4 | A |
5 | B |
6 | B |
7 | D |
8 | A |
9 | B |
10 | F |
11 | C |
12 | C |
- The Necessity of Frequency Table: The above raw-data table is not only boring but also difficult to grasp the story contained in the raw data. It does not answer some interesting questions, such as, how difficult the class was and how many students got A or B. Therefore, we need to reorganize the data into the value column and the frequency column (i.e., create a frequency table) as shown below.
Grades | Frequency | Percent |
A | 33 | 22% |
B | 55 | 37% |
C | 25 | 17% |
D | 6 | 4% |
Drop/F | 29 | 20% |
Sum | 148 | 100% |
- Usefulness of the Frequency Table: Now, you can tell that
- 22% of the my students got A.
- 37% got B.
- 17% got C.
- In other words, those students (22% with A + 37% with B + 17% with C = 76%) passed my course,
- But, there were 4% D and 20% Drop/F (4 + 20 = 24%) who could not pass.
How to Make a Frequency Table With 3 Columns Manually
Step 1: Create an empty three-column table with the following column headings:
- Column 1 heading: Variable Name (e.g., Grades)
- Column 2 heading: Frequency
- Column 3 heading: Percent
Step 2: In Column 1 (whose heading is a variable name like Grades), list all the possible values of the variable in an alphabetical or ascending/descending order. This will allow the readers to find any of the values of their interests. At the bottom, add a row with the heading “Total.”
As will be shown later, when you create a frequency table using software (e.g., Excel or SPSS), this process of “finding all the possible values” will be done automatically.
[Star Q] How many rows would we need?
[Star A] It depends on how many values the variable has (the more the values, the longer the table). When there are too many values, we may want to group the values and make the frequency table a grouped frequency table (see below – What If The Frequency Table Is Too Long?).
Step 3: In Column 2 (whose heading is frequency), make a tally mark (////) for each value as you go through the values in the raw data set. For example, in the raw data table (the left table below), the first student’s grade is C. Therefore, in the frequency table (the right table below), in the frequency column, put one tally mark next to C. Then, in the raw data table, move down to the next student’s grade, which is D. Then, in the frequency table, next to D, put one tally mark, and so on. Then, you will eventually have tally marks as shown in the red box below.
Then, convert the tally marks into numbers.
Step 4: Calculate the percent of each value in the third column using the formula below:
5. Finally, you can represent the same information of the table in a graphical format (bar graph or histogram) as shown below:
, where
- X-axis shows all the possible values (from the first column of the frequency table)
- Y-axis shows the frequency of each value in terms of the bar’s height (from the second column of the frequency table).
[Star Q] Why graphs? What additional information does a frequency graph provide over the frequency table?
[Star A] Trend. Although the frequency graph provides the exact same information as the frequency table, one can see the trend (ups and downs) of the frequency across the values.
- A trend could be especially useful when the X-axis shows a meaningful continuum of a variable (e.g., time). For example, sales of a store across the months.
Implications of Frequency Analysis
People cannot understand the story of data when they were presented in the raw form. To help them understand the data, you need to organize the data into a frequency table or graph.
Frequency analysis may seem to be basic. However, it has some practical / data-handling-related implications.
1. Practical Implications
The practical implication is that people who give you a dataset are often interested in specific values and their frequencies only.
For example, my department’s chair would be interested in how many non-passing grades are in my class that would make her uncomfortable in front of the president. That is, she would be interested in the frequency of only D or Drop/Fs.
Bosses often ask, “how many Xs do you have or what’s the percentage of Y?”
Therefore, as a data analyst, you should first organize all the values in order along with their frequency (and %) so that you can give the answers right away.
As another example, one data set that I handled as a PhD student was about aircraft incident types.
My advisor once asked, “How many times has incident type A occurred?”
I happened to have a frequency table of all the incident types and could answer the question (when my colleagues didn’t have such a table).
This helped me to continue working as an RA during the summer, during which one can work double and get paid double.
2. Data-Handling-Related Implications
Another important implication of the frequency table is that it can help you with the data handling – the process of eliminating unuseful data.
The frequency table shows what are the extremely infrequent values including typos or missing responses that you would want to remove.
By removing the rare or undesirable values, one can get a more representative dataset.
(Rare values are not necessarily removed, we often remove them so that we can have a clean dataset.)
What If The Frequency Table Is Too Long?
Imagine a variable that has too many values, for example, the mileage of 100,000 used cars or the title of favorite movies of people from 100 countries, resulting in 10,000 movie tiles. Because a frequency table is supposed to show “all” the possible values of the variable in its first column, in these cases, the tables will be very long.
A too-long table is not only inconvenient to keep on paper but also hard to understand.
[Star Q] What should we do?
[Star A] Combine the values into groups. For example, group the mileage by 10,000 (eg., less than 10,000, more than 10,000 but less than 20,000, etc.).
A frequency table where the values are grouped to make the table short is called a grouped frequency table and here’s an example based on the same grades data.
Based on the raw data (on the left), a frequency table is created (in the middle). Based on the frequency table, a grouped frequency table (on the right) is created by grouping the values of the middle table. Specifically, A, B, and C were grouped as Pass, and D and F were grouped as Fail. Although we do miss some detailed info (e.g., how many As or Bs) but a grouped frequency table is more succinct and necessary if there are too many values.
Videos Tutorials
Generating a (Grouped) Frequency Table Using Excel.
1. When you know what kinds of values are there in the dataset
*When you make a graph, make sure to select a range of data first based on which you want to create the graph!
2. When you don’t know the kinds of values in the dataset
Generating a (Grouped) Frequency Table Using SPSS
Hope this helped.
– From your sidekick.