mappa .001

We are assisting in Italy to a number of deaths owing to coronavirus (or associated with coronavirus) incredibly higher than other countries. Several factors could contribute to the explanation of this phenomenon, especially the impact of the observers (who counts).

But there also some statistical issues to be analyzed that might partially explain the issue

  • regression to the mean
  • the law of large numbers

In Lombardia – a region of Italy with 10 millions of citizens – there are concentrated almost 90% of Italian deaths owing to coronavirus. Italia has 60 millions of citizens.

Let’s consider the data in Lombardia:

  • Milan is the biggest province of Lombardia with about 3.000.000 of citizens
  • Lodi is the smallest city of Lombardia with about 200.000 citizens

Incredibly, we are observing an extremely higher number of death in Lodi than in Milan, yet we would expect more deaths in Milan for several reasons, especially because Milan is more interconnected than Lodi.

  • In Milan we have been observing less than 100 deaths
  • In Lodi we have been observing 220 deaths

This means that the rates of fatality, calculated on the province population, is :

  • 0.00003% in Milan’s province
  • 0,001% in Lodi’s province

We have to consider, for example, that death rate in Rome (the main city of Italy) is almost zero.

Regression to the mean is a common statistical phenomenon that can mislead us when we observe the world. Learning to recognize when regression to the mean is at play can help us avoid misinterpreting data and seeing patterns that don’t exist.

Regression to the mean occurs whenever a nonrandom sample is selected from a population and two imperfectly correlated variables are measured, such as two consecutive blood pressure measurements. The less correlated the two variables, the larger the effect of regression to the mean. Also, the more extreme the value from the population mean, the more room there is to regress to the mean. It occurs whenever a group is selected with extreme values for one variable and another variable is then measured.

Francis Galton documented the phenomenon in 1886. Galton measured the height of 930 adult children and their parents and calculated the average height of the parents. He noted that when the average height of the parents was greater than the mean of the population, the children tended to be shorter then the parents. Likewise, when the average height of the parents was shorter than the population mean, the children tended to be taller than their parents. Galton called this phenomenon regression towards mediocrity, and it is now known as regression to the mean.

Ignorance of this phenomenon is widespread. Pilot instructors noted that when a trainee pilot was praised for a good landing they invariably made a subsequent poor landing. This was misinterpreted as praise lulling pilots into complacency when the real explanation was regression towards the mean.

All healthcare professionals need to be aware of regression to the mean as it has wide ranging effects.

The law of large number is instead well explained in the following example.

A certain town is served by two hospitals. In the larger hospital about 45 babies are born each day, and in the smaller hospital about 15 babies are born each day. As you know, about 50% of all babies are boys. However, the exact percentage varies from day to day. Sometimes it may be higher than 50%, sometimes lower.

For a period of 1 year, each hospital recorded the days on which more than 60% of the babies born were boys. Which hospital do you think recorded more such days?

  1. The larger hospital
  2. The smaller hospital
  3. About the same (that is, within 5% of each other)

56% of subjects chose option 3, and 22% of subjects respectively chose options 1 or 2.

However, according to sampling theory the larger hospital is much more likely to report a sex ratio close to 50% on a given day than the smaller hospital which requires that the correct answer to the question is the smaller hospital.