Pearson Correlation and the Dot Product
Given two vectors X and Y, where
- X = (x1, x2, .., xn) and Y = (y1, y2, .., yn)
The Dot Product is defined as:
- dp = X . Y = x1*y1 + x2*y2 + .. + xn*yn
The Pearson Correlation (PCor) is generally defined as:
PCor = Mean ((X - mX) * (Y - mY)) / (sX * sY)
where mX = mean of X, sX = standard deviation of X
Pearson's Correlation can be directly related to the Dot Product ONLY IF every vector is first centered to 0, by subtracting its mean:
- X' = X - mX
- Y' = Y - mY
After that transformation it is possible to write:
PCor = X' . Y' / (||X'||*||Y'||)
where ||X|| is the norm of the vector, i.e. sqrt (x12 + x22 + .. +xn2)
Therefore, after the transformation, the Pearson Correlation can be regarded as a weighted Dot Product.
The weighting enables every vector couple to have the same "leverage", notwithstanding the different norm (which can be regarded as the "magnitude").
As the Pearson Correlation requires the mean and standard deviation estimation, it is sensitive to a limited number of points.
In the following paper the minimal number of points required to satisfactorily estimate Pearson's Correlation is estimated as 5.
Accuracy of Correlation Coefficient with Limited Number of Points
For more details on Pearson Correlation, and its relation to the Dot Product, refer to: Wikipedia, Pearson's Correlation