Correlation: What’s the relationship between your height and your parents’ height?

The following graph shows the relationship between the two.

Galton’s “Table of Correlation.” Reprinted from Pearson (1920)

The point of this figure is that the taller the parents, the taller the offspring (and vice versa).

We should note, though, that the offspring of even very tall parents would not equal the height of its parents, but rather would be slightly shorter in stature and vice versa (i.e., regressing toward the mean height of the population).

As a side note, Galton possessed a penchant for observations, some would even say an “obsession.” Galton measured everything he could, from wind direction to heights and fingerprints.

————————————-

We probably do not show such an obsessive behavior of measuring things, but we probably believe in the “hidden relationship” between many things.

Let’s say you are a boss who needs new employees. You want college graduates and decided to ask them to bring their GPA. In that scenario, would you consider high-GPA applicants or low-GPA applicants? Probably high-GPA applicants (right)?

Why, though? Probably because you assume that GPA is related to their future work performance.

And, as you may expect, colleges and companies have similar assumptions, too.

————————————–

Now, as you may also expect, it would be really convenient for colleges and companies to have a single indicator that shows the strength of a relationship between your prior performance score (e.g., SAT score) and future performance score (e.g., college GPA). And here comes the correlation coefficient. (correlation is just a fancy term for a relationship)

(If this sounds somewhat familiar, it is probably because I mentioned similar convenience when I introduced the standardized score or the z score: I had said that colleges and companies like to line us up based on our scores and figure out where we are located based on our standardized score or the Z score. Likewise, they want a quick indicator of a relationship between any two things that are interested in)

I’d like to wrap this blog up by sharing two pieces of information.

1. The correlation coefficient between SAT and the first-semester-in-college GPA is 0.4 (Jennifer & Kobrin, 2008). So what? It means that ONLY 16% of your academic performance can be explained by your SAT score – not a lot, right? (To get the 16%, you need to square the correlation and multiply it by 100 => 0.4 * 0.4 * 100 = 16%) . 

(By the way, actually measuring things to prove the relationship between things like Galton did may not necessarily be a bad idea as it can prevent us from believing in a false relationship).

2. Here’s the formula of the correlation coefficient (more specifically, this is the Pearson-version correlation coefficient or Pearson’s correlation coefficient, which he calls r).

If you consider that the early correlation folks wanted to describe the relationship between the parents’ height and the offsprings’ height, this formula may make a good sense (than it looks at the first sight).

If you look at the numerator, we are dealing with two standardized scores (Zx and Zy), which is the essence of the formula as the denominator is just N-1. Zx represents the parents’ height’s standardized scores. As you may expect, some of the parents would be taller than the average parents and have positive scores. Others parents would be smaller than the average parents and have negative scores. (But 68% of the parents’ Z scores would be within the 1 standardized-score range and 95% of the parents’ Z scores would be within the two standardized-score range, and 99% of the parents’ Z scores would be within the 3 standardized-score range, following the 68-95-99% rule). 

Similarly, Zy represents the standardized scores of the offsprings’ height, which also follows the same 68-95-99% rule.

Now we multiply the parent height’s z score and the offspring height’s z score across people. I wish this makes intuitive sense. That is, to represent the “relationship” between the parents’ and offspring’s heights, we are multiplying them (instead of adding/subtracting or dividing them). It might be related to the way we describe the collaboration between two musicians as [musician A X musician B].

So What? What happens if the correlation formula has this Zx * Zy component?

a) If a parents’ height is above the average, then, their z score would be +. If their offspring’s height is above the average, then, the offspring’s z score would be +. And as you know, + multiplied by + would be +.

b) If a parents’ height is below the average, then, their z score would be -. If their offspring’s height is below the average, then, the offspring’s z score would be -. Again, – multiplied by – would be +.

In sum, when the parents’ height and offsprings’ height go hand in hand (both are above the average or below the average as shown below), then the correlation would be +.

In contrast,

c) If a parents’ height is above the average (therefore their z scores would be +) but if their offspring’s height is below the average (therefore the offspring’s z score would be -), then, + multiplied by – would be -.

d) If a parents’ height is below the average (therefore their z scores would be -) but if their offspring’s height is above the average (therefore the offspring’s z score would be +), then, – multiplied by + would be -.

In sum, when the parents’ scores and offsprings scores go to the opposite directions (as shown below), then the correlation would be -.

So, the Pearson’s correlation coefficient formula would be able to capture the direction of the relationship and represent the direction by its sign (+ –> positive relationship vs. – –> negative relationship).

Now, I hope this formula makes more sense than before.

-Statistics sidekick

Leave a Comment