Correlation can be employed for the studying of associations existing among the variables combined in a given data set. The analysis carried out with the aid of correlation has its advantages and disadvantages. Below, we will dwell on the methods of quantifying correlation.
Correlation is a measure of the relationship that exists between a group of variables, two or more in number, and their interdependence. It is a number denoted by the letter r, a distinct number, and defines the level to which two or more variables are associated.
To represent linear relationships in numerical form, correlation coefficients are employed. Such coefficients describe the manner in which the given variables vary regarding each other.
The numbers between +1 and -1 can be used as correlation coefficients. It has been pointed out that there is zero correlation, in which case no relationship exists between variables. When the value of r is equal to +1, that is an indication of perfect positive correlation (where variables invariably move together, going in the same direction, and when one of the variables increases, there is an increase in the other variable too). When the value of r is -1, that is called perfect negative correlation (in which variables invariably move together, but they go in opposite directions with respect to each other, and when one variable is increased, that leads to a decrease in the other variable.
A rule of thumb to use as a guideline is that when correlation coefficients are between the values 0.1 and 0.3, that means the extent of association between variables is small, when the values range 0.3 and 0.5, the association between them is termed medium; in the range of 0.5 to 1.0 the effect is relatively big, and the following figures can serve as illustrations.
Correlation coefficients can be interpreted in differing manners, as they can be divided into several types, such as Pearson’s coefficients, Spearman’s coefficients, or Kendall’s coefficients.
The question can be raised about the origin of the numbers discussed above. But before that, it is time to discuss certain definitions outlining the measures that can be applied to a given data set. Although such discussion is related to technicalities, it is useful for the analysis as they can yield information on the data.
In the area of statistics, dealing with samples of a given population is common. Now two more concepts emerge, sample and population. The concept of population comprises any collection of individual entities, such as people, flora or fauna representatives, and these entities can yield data to be collected and used to make inferences. The other concept, the concept of a sample, is in fact a subset of the concept of population: samples are groups of entities that have been selected with care among the items of the larger entity, the population. When conclusions are made regarding a sample, they can also hold true for the larger group, the population. It follows that it is of prime importance to obtain a sample that is efficiently representative of the whole population.
1. One example can be a utility company which would like to draw conclusions on the average weekly consumption of electricity made by single-family residential units within a country and the changes that occur in that consumption when family incomes change. In this example, the population is the weekly consumption of electricity for all the single-family residential units within that country, and the sample is the consumption for a chosen group of single-family residential units within that country.
2. Another example that can illustrate the above is the example of a climatologist striving to assess the mean length of time which elapses till there is a recurrence of a given precipitation pattern. In this example, the population is the time that elapses till the recurrence for all the incidents of the specific precipitation pattern, and the sample is the times that elapse till the recurrence of a specific group of occurrences related to the specific precipitation pattern.
3. The third example can be given from the viewpoint of a dietician who would like to trace how the average weight characteristic of adult men in a country changes in the course of age. The population for that case is the group of all men residing in the country and belonging to the specified age range, and the sample is a selected group of men belonging to that age group.
4. A final example can be given with the citizens of a country that can be asked to express their opinions regarding a specific celebrity. In that example, the population is the whole group of citizens of the country, and a sample is a selected group of citizens, e.g. 500 in number, that is randomly selected from different regions of the country.
What the possible pitfalls are
In the course of the analysis, the strategy of selecting the sample has to be unbiased, which means that each of the entities within the specific population has to stand an equal chance of being eligible as sample element. To do that, one way is the random sampling method, whose benefit lies in the fact that there is no influence exerted over the choice of individuals from the specific population, and for that reason there is no organized preference present with respect to specific entities or specific conclusions – everything is left to chance.
The examples outlined above bring forward the idea that it is possible to draw just some conclusions on the range of interest in the specific population; the reason is that there are always certain amounts of uncertainty and certain inaccuracy when conclusions are drawn on a population on the basis of a sample. It is obvious that owing to the fact that there are fewer items within a sample than within a population, a certain amount of information will inevitably be lost.
Here it should be reminded that a sample taken from a population is just one of the numerous possible samples that can be taken from that population. If we assume that a certain number of researches start examining the same population, they will all possibly have different conclusions.