The terms covariance and correlation are very similar to each other in probability theory and statistics. Both the terms describe the extent to which a random variable or a set of random variables can deviate from the expected value. But what is the difference between the terms?
- Standard Deviation
- Differences between Covariance and Correlation
Contributed by: Deepak Gupta
LinkedIn Profile: https://www.linkedin.com/in/deepak-gupta-786375123/
In statistics, it is frequent that we come across these two terms known as covariance and correlation. The two terms are often used interchangeably. These two ideas are similar, but not the same. Both are used to determine the linear relationship and measure the dependency between two random variables. But are they the same? Not really.
Despite the similarities between these mathematical terms, they are different from each other.
Covariance is when two variables vary with each other, whereas Correlation is when the change in one variable results in the change in another variable.
In this article, we will try to define the terms correlation and covariance matrices, talk about covariance vs correlation, and understand the application of both terms.
Before going into the details, let us first try to understand variance and standard deviation.
Variance is the expectation of the squared deviation of a random variable from its mean. Informally, it measures how far a set of numbers are spread out from their average value.
Standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range. It essentially measures the absolute variability of a random variable.
Also Read: Hypothesis Testing in R
Covariance signifies the direction of the linear relationship between the two variables. By direction we mean if the variables are directly proportional or inversely proportional to each other. (Increasing the value of one variable might have a positive or a negative impact on the value of the other variable).
The values of covariance can be any number between the two opposite infinities. Also, it’s important to mention that covariance only measures how two variables change together, not the dependency of one variable on another one.
The value of covariance between 2 variables is achieved by taking the summation of the product of the differences from the means of the variables as follows:
The upper and lower limits for the covariance depend on the variances of the variables involved. These variances, in turn, can vary with the scaling of the variables. Even a change in the units of measurement can change the covariance. Thus, covariance is only useful to find the direction of the relationship between two variables and not the magnitude. Below are the plots which help us understand how the covariance between two variables would look in different directions.
Correlation analysis is a method of statistical evaluation used to study the strength of a relationship between two, numerically measured, continuous variables.
It not only shows the kind of relation (in terms of direction) but also how strong the relationship is. Thus, we can say the correlation values have standardized notions, whereas the covariance values are not standardized and cannot be used to compare how strong or weak the relationship is because the magnitude has no direct significance. It can assume values from -1 to +1.
To determine whether the covariance of the two variables is large or small, we need to assess it relative to the standard deviations of the two variables.
To do so we have to normalize the covariance by dividing it with the product of the standard deviations of the two variables, thus providing a correlation between the two variables.
The main result of a correlation is called the correlation coefficient.
The correlation coefficient is a dimensionless metric and its value ranges from -1 to +1.
The closer it is to +1 or -1, the more closely the two variables are related.
If there is no relationship at all between two variables, then the correlation coefficient will certainly be 0. However, if it is 0 then we can only say that there is no linear relationship. There could exist other functional relationships between the variables.
When the correlation coefficient is positive, an increase in one variable also increases the other. When the correlation coefficient is negative, the changes in the two variables are in opposite directions.
Also Read: Linear Regression in Machine Learning
When there is no relationship, there is no change in either.
Covariance and correlation are related to each other, in the sense that covariance determines the type of interaction between two variables, while correlation determines the direction as well as the strength of the relationship between two variables.
Differences between Covariance and Correlation
Both the Covariance and Correlation metric evaluate two variables throughout the entire domain and not on a single value. The differences between them are summarized in a tabular form for quick reference. Let us look at Covariance vs Correlation.
|Covariance is a measure to indicate the extent to which two random variables change in tandem.||Correlation is a measure used to represent how strongly two random variables are related to each other.|
|Covariance is nothing but a measure of correlation.||Correlation refers to the scaled form of covariance.|
|Covariance indicates the direction of the linear relationship between variables.||Correlation on the other hand measures both the strength and direction of the linear relationship between two variables.|
|Covariance can vary between -∞ and +∞||Correlation ranges between -1 and +1|
|Covariance is affected by the change in scale. If all the values of one variable are multiplied by a constant and all the values of another variable are multiplied, by a similar or different constant, then the covariance is changed.||Correlation is not influenced by the change in scale.|
|Covariance assumes the units from the product of the units of the two variables.||Correlation is dimensionless, i.e. It’s a unit-free measure of the relationship between variables.|
|Covariance of two dependent variables measures how much in real quantity (i.e. cm, kg, liters) on average they co-vary.||Correlation of two dependent variables measures the proportion of how much on average these variables vary w.r.t one another.|
|Covariance is zero in case of independent variables (if one variable moves and the other doesn’t) because then the variables do not necessarily move together.||Independent movements do not contribute to the total correlation. Therefore, completely independent variables have a zero correlation.|
Both Correlation and Covariance are very closely related to each other and yet they differ a lot.
When it comes to choosing between Covariance vs Correlation, the latter stands to be the first choice as it remains unaffected by the change in dimensions, location, and scale, and can also be used to make a comparison between two pairs of variables. Since it is limited to a range of -1 to +1, it is useful to draw comparisons between variables across domains. However, an important limitation is that both these concepts measure the only linear relationship.
If you wish to learn more about statistical concepts such as covariance vs correlation, upskill with Great Learning’s PG program in Data Science and Business Analytics.18