What is the difference between Pearson and Spearman correlation?

Pearson’s correlation coefficient measures the linear relationship whereas Spearman’s (rank) correleation coeffient assesses the mononotic relationship between two variables. Each coefficient has its legitmation depending on the use-case.

\[ \fbox{ $\textbf{Pearson} \hspace{0.5cm} r \left( \textbf{X}, \textbf{Y} \right) = \frac {\text{Cov} \left( \textbf{X}, \textbf{Y}\right)} {\sqrt{\text{Var} \left(\textbf{X}\right)} \sqrt{\text{Var}\left(\textbf{Y}\right)}} $} \] \[ \fbox{ $\textbf{Spearman} \hspace{0.5cm} r_s\left( \textbf{X}, \textbf{Y} \right) = \frac {\text{Cov} \big( \text{R} (\textbf{X}), \text{R} (\textbf{Y}) \big)} { \sqrt{\text{Var} \big( \text{R} (\textbf{X}) \big)} \sqrt{\text{Var} \big( \text{R} (\textbf{Y}) \big)}} $ } \]

Covariance Explained

What is it?

Covariance is a measure of the joint variability within two variables, i.e., it tells us how strongly two variables vary together. Mathmatically, it is defined as follows

\[ \begin{align} \text{Cov} \left( \textbf{X}, \textbf{Y} \right) &= \mathbb{E} \Big[ \left(\textbf{X} - \mathbb{E}\big[\textbf{X}\big] \right) \left(\textbf{Y} - \mathbb{E}\big[\textbf{Y}\big] \right) \Big]\\ & \stackrel{\text{sample}}{=} \frac {1} {N} \sum_{i=1}^N \left( x_i - {\mu}_{\textbf{X}} \right) \left( y_i - {\mu}_{\textbf{Y}} \right) \end{align} \]

So what does that mean in plain words? Suppose that $\text{Cov}\left(\textbf{X}, \textbf{Y}\right) > 0$: Then, it simply means that a greater than average value for one variable ($x_i > \mu_{\textbf{X}})$ is expected to be associated with a greater than average value for the other variable ($y_i > \mu_{\textbf{Y}})$. Thus, covariance encodes information about the direction of the joint distribution.

Covariance is unbounded which can be seen by simply scaling one distribution. Note that scaling would also result in an increased variance within the scaled distribution.

Covariance gives some estimate about the linear dependence between two random variables, i.e., its sign indicates whether two variables are positively or negatively correlated. As covariance is unbounded, it’s value doesn’t help us in estimating the strength of the linear relationship.

Where does it come from?

Covariance actually makes most sense for a multivariate normal distribution $\mathcal{N}\big(\boldsymbol{\mu}, \boldsymbol{\Sigma}\big)$. Thus, one might think of covariance being originally defined only for a multivariate normal distribution¹.

Let’s take a closer look at the distributional parameters of $\mathcal{N}\big(\boldsymbol{\mu}, \boldsymbol{\Sigma}\big)$

mean vector $\boldsymbol{\mu}$ represents the center of the distribution.
covariance matrix $\boldsymbol{\Sigma}$ represents the shape of the distribution.
It’s entries are defined as follows

\[ \Sigma_{ij} = \text{Cov} \big(\textbf{X}_i, \textbf{X}_j \big), \]

i.e., for a two-dimensional multivariate normal distribution (as in the Visualizations above)

\[ \boldsymbol{\Sigma} = \begin{bmatrix} \text{Cov} \big( \textbf{X}, \textbf{X}\big) & \text{Cov} \big( \textbf{X}, \textbf{X}\big) \\ \text{Cov} \big( \textbf{Y}, \textbf{X}\big) & \text{Cov} \big( \textbf{Y}, \textbf{Y}\big) \end{bmatrix} = \begin{bmatrix} \text{Var} \big( \textbf{X}\big) & \text{Cov} \big( \textbf{X}, \textbf{X}\big) \\ \text{Cov} \big( \textbf{X}, \textbf{Y}\big) & \text{Var} \big( \textbf{Y}\big) \end{bmatrix} \]

The following interactive demo should give a more intuitive understanding how the covariance matrix and mean vector impact the resulting distribution.

Interactive Visualization of Multivariate Gaussian

Code

viewof param = Inputs.form(
[
  Inputs.range([-4, 4], {label: tex`\mu_{\textbf{X}}`, step: 0.1}),
  Inputs.range([-4, 4], {label: tex`\mu_{\textbf{Y}}`, step: 0.1}),
  Inputs.range([1, 2], {label: tex`\text{Var}\big(\textbf{X}\big)`, step: 0.1}),
  Inputs.range([-1, 1], {label: tex`\text{Cov}\big(\textbf{X}, \textbf{Y}\big)`, step: 0.1}),
  Inputs.range([1, 2], {label: tex`\text{Var}\big(\textbf{Y}\big)`, step: 0.1}),
])

Code

math = require("mathjs")

meanVector = [param[0],param[1]]
covarianceMatrix = [
  [param[2], param[3]],
  [param[3], param[4]]
]

// compute eigenvalues and eigenvectors of covarianceMatrix
math_ans = math.eigs(covarianceMatrix)
// vectors are sorted ascending (we want descending)

first_eigenvector = math_ans.eigenvectors[0].vector
second_eigenvector = math_ans.eigenvectors[1].vector

eigenvectors = [
  { "x": first_eigenvector[0][0], "y": first_eigenvector[1][0]},
  { "x": second_eigenvector[0][0], "y": second_eigenvector[1][0]}
]
eigenvalues = math_ans.values.reverse()

tex`
{
\begin{aligned}
\quad \quad
&\boldsymbol{\mu} = 
\begin{pmatrix} 
  ${meanVector[0]} \\ 
  ${meanVector[1]} 
\end{pmatrix} \quad \quad
\boldsymbol{\Sigma} = 
\begin{pmatrix} 
  ${covarianceMatrix[0][0]} & ${covarianceMatrix[0][1]} \\ 
  ${covarianceMatrix[1][0]} & ${covarianceMatrix[1][1]} 
\end{pmatrix}
\end{aligned}
}
`

Code

d3 = require("d3")
import {boxMuller} from "@sw1227/box-muller-transform"
import {choleskyDecomposition} from "@sw1227/cholesky-decomposition"

// taken from https://observablehq.com/@sw1227/multivariate-normal-distribution
function multivariateNormal(mean, covArray) {
  const n = mean.length;
  const cov = math.matrix(covArray);
  return {
    // Probability Density Function
    pdf: x => {
      const c = 1 / (math.sqrt(2*math.PI)**n * math.sqrt(math.det(cov)));
      return c * math.exp(
        -(1/2) * math.multiply(
          math.subtract(math.matrix(x), math.matrix(mean)),
          math.inv(cov),
          math.subtract(math.matrix(x), math.matrix(mean))
        )
      );
    },
    // Differential entropy
    entropy: 0.5*math.log(math.det(cov)) + 0.5*n*(1 + math.log(2*math.PI)),
    // Generate n samples using Cholesky Decomposition
    sample: n_samples => Array(n_samples).fill().map(_ => {
      const L = choleskyDecomposition(cov);
      const z = boxMuller(n);
      const array = math.add(math.matrix(mean), math.multiply(cov, math.matrix(z))).toArray();
      return {"x": array[0], "y": array[1]};
    }),
  };
}

scale = 2
eigenVecLine1 = [
  {"x": meanVector[0], "y": meanVector[1], "text": ""},
  {"x": meanVector[0] + scale*eigenvalues[0]*eigenvectors[0]["x"], "y": meanVector[1] + scale*eigenvalues[0]*eigenvectors[0]["y"], 
   "text": "v1"},
]
eigenVecLine2 = [
  {"x": meanVector[0], "y": meanVector[1], "text": ""},
  {"x": meanVector[0] + scale*eigenvalues[1]*eigenvectors[1]["x"], "y": meanVector[1] + scale*eigenvalues[1]*eigenvectors[1]["y"], 
   "text": "v1"},
]

// adapted from: https://observablehq.com/@observablehq/building-scatterplots-using-plot
Plot.plot({
  grid: true, // by default Plot has "grid: false", to show a grid we set grid to 'true'
  marks: [
    <!-- Plot.frame(), // draws the frame mark around the Plot -->
    Plot.ruleY([0]),
    Plot.ruleX([0]),
    

    Plot.dot(multivariateNormal(meanVector, covarianceMatrix).sample(1000), 
             {x: "x",
              y: "y",
              fill: 50,
              r: 2.5,
              stroke: "black"
             }),
    //https://observablehq.com/@observablehq/plot-connected-scatterplot
    Plot.line(eigenVecLine1, {x: "x", y:"y", marker: false, curve: "catmull-rom", stroke: "red"}),
    Plot.line(eigenVecLine2, {x: "x", y:"y", marker: false, curve: "catmull-rom", stroke: "orange"}),
    <!-- Plot.text(eigenVecLine1, {x: "x", y:"y", marker: false, curve: "catmull-rom", color: "red", text: "text", dy: -8}), -->
  ],
  color: {
    scheme: "Blues",  
    domain: [0,100], // this is the equivalent of the vmin / vmax in matplotlip
  },
  x: {
    domain: [-10, 10], // 
    label: "X",
    ticks: d3.range(-10,10, 2), // equivalent of np.arange
  },
  y: {
    domain : [-10, 10],
    label: "Y",
    ticks: d3.range(-10, 10, 2), // equivalent of np.arange
  },
  width: 550,
  height: 550
})

Note that the red and orange lines represent the eigenvectors of the covariance matrix. Interestingly, these vectors also represent the principal components of the distribution.

Useful Properties

covariance is independent under shifting of the variables, i.e.,

\[ \text{Cov} \left( \textbf{X}, \textbf{Y} \right) = \text{Cov} \left( \textbf{X} + a, \textbf{Y} + b\right) \]

Otherwise the covariance matrix would be dependent on the mean vector.
linear combinations of covariances can be facored out, i.e.,

\[ \text{Cov} \left( a\textbf{X} + \textbf{Y}, \textbf{Z} \right) = a\text{Cov} \left( \textbf{X}, \textbf{Z} \right) + \text{Cov} \left( \textbf{Y}, \textbf{Z} \right) \]

As mentioned earlier, scaling a distribution leads to a scaled covariance.
square of covariance is less than or equal to product of variances (proof), i.e.,

\[ \Big(\text{Cov} \big( \textbf{X}, \textbf{Y}\big)\Big)^2 \le \text{Var} \big( \textbf{X} \big) \text{Var} \big( \textbf{Y}\big) \]

absolute covariance is upper-bounded by maximum of variances, i.e.,

\[ \lvert \text{Cov} \big( \textbf{X}, \textbf{Y}\big)\rvert \le \text{max} \Big(\text{Var} \big( \textbf{X} \big), \text{Var} \big( \textbf{Y}\big) \Big) \]

which follows directly from the square of covariance property.

Footnotes

This statement is not true. I couldn’t find who and when introduced the term covariance, but it is very likely that it wasn’t in the context of a multivariate Gaussian. However, seeing it that way may help to put covariance into context (IMHO).↩︎