What is the difference between Pearson and Spearman correlation?
Pearson’s correlation coefficient measures the linear relationship whereas Spearman’s (rank) correleation coeffient assesses the mononotic relationship between two variables. Each coefficient has its legitmation depending on the use-case.
Covariance is a measure of the joint variability within two variables, i.e., it tells us how strongly two variables vary together. Mathmatically, it is defined as follows
So what does that mean in plain words? Suppose that \(\text{Cov}\left(\textbf{X},
\textbf{Y}\right) > 0\): Then, it simply means that a greater than average value for one variable (\(x_i > \mu_{\textbf{X}})\) is expected to be associated with a greater than average value for the other variable (\(y_i > \mu_{\textbf{Y}})\). Thus, covariance encodes information about the direction of the joint distribution.
Covariance is unbounded which can be seen by simply scaling one distribution. Note that scaling would also result in an increased variance within the scaled distribution.
Covariance gives some estimate about the linear dependence between two random variables, i.e., its sign indicates whether two variables are positively or negatively correlated. As covariance is unbounded, it’s value doesn’t help us in estimating the strength of the linear relationship.
Where does it come from?
Covariance actually makes most sense for a multivariate normal distribution\(\mathcal{N}\big(\boldsymbol{\mu}, \boldsymbol{\Sigma}\big)\). Thus, one might think of covariance being originally defined only for a multivariate normal distribution1.
Let’s take a closer look at the distributional parameters of \(\mathcal{N}\big(\boldsymbol{\mu}, \boldsymbol{\Sigma}\big)\)
mean vector\(\boldsymbol{\mu}\) represents the center of the distribution.
covariance matrix\(\boldsymbol{\Sigma}\) represents the shape of the distribution.
It’s entries are defined as follows
d3 =require("d3")import {boxMuller} from"@sw1227/box-muller-transform"import {choleskyDecomposition} from"@sw1227/cholesky-decomposition"// taken from https://observablehq.com/@sw1227/multivariate-normal-distributionfunctionmultivariateNormal(mean, covArray) {const n = mean.length;const cov = math.matrix(covArray);return {// Probability Density Functionpdf: x => {const c =1/ (math.sqrt(2*math.PI)**n * math.sqrt(math.det(cov)));return c * math.exp(-(1/2) * math.multiply( math.subtract(math.matrix(x), math.matrix(mean)), math.inv(cov), math.subtract(math.matrix(x), math.matrix(mean)) ) ); },// Differential entropyentropy:0.5*math.log(math.det(cov)) +0.5*n*(1+ math.log(2*math.PI)),// Generate n samples using Cholesky Decompositionsample: n_samples =>Array(n_samples).fill().map(_ => {const L =choleskyDecomposition(cov);const z =boxMuller(n);const array = math.add(math.matrix(mean), math.multiply(cov, math.matrix(z))).toArray();return {"x": array[0],"y": array[1]}; }), };}scale =2eigenVecLine1 = [ {"x": meanVector[0],"y": meanVector[1],"text":""}, {"x": meanVector[0] + scale*eigenvalues[0]*eigenvectors[0]["x"],"y": meanVector[1] + scale*eigenvalues[0]*eigenvectors[0]["y"],"text":"v1"},]eigenVecLine2 = [ {"x": meanVector[0],"y": meanVector[1],"text":""}, {"x": meanVector[0] + scale*eigenvalues[1]*eigenvectors[1]["x"],"y": meanVector[1] + scale*eigenvalues[1]*eigenvectors[1]["y"],"text":"v1"},]// adapted from: https://observablehq.com/@observablehq/building-scatterplots-using-plotPlot.plot({grid:true,// by default Plot has "grid: false", to show a grid we set grid to 'true'marks: [<!-- Plot.frame(),// draws the frame mark around the Plot --> Plot.ruleY([0]), Plot.ruleX([0]), Plot.dot(multivariateNormal(meanVector, covarianceMatrix).sample(1000), {x:"x",y:"y",fill:50,r:2.5,stroke:"black" }),//https://observablehq.com/@observablehq/plot-connected-scatterplot Plot.line(eigenVecLine1, {x:"x",y:"y",marker:false,curve:"catmull-rom",stroke:"red"}), Plot.line(eigenVecLine2, {x:"x",y:"y",marker:false,curve:"catmull-rom",stroke:"orange"}),<!-- Plot.text(eigenVecLine1, {x:"x",y:"y",marker:false,curve:"catmull-rom",color:"red",text:"text",dy:-8}),--> ],color: {scheme:"Blues",domain: [0,100],// this is the equivalent of the vmin / vmax in matplotlip },x: {domain: [-10,10],// label:"X",ticks: d3.range(-10,10,2),// equivalent of np.arange },y: {domain: [-10,10],label:"Y",ticks: d3.range(-10,10,2),// equivalent of np.arange },width:550,height:550})
Note that the red and orange lines represent the eigenvectors of the covariance matrix. Interestingly, these vectors also represent the principal components of the distribution.
Useful Properties
covariance is independent under shifting of the variables, i.e.,
which follows directly from the square of covariance property.
Footnotes
This statement is not true. I couldn’t find who and when introduced the term covariance, but it is very likely that it wasn’t in the context of a multivariate Gaussian. However, seeing it that way may help to put covariance into context (IMHO).↩︎