Population Risk Factors and their Relation to COVID-19's Impact

Timothy Carroll
5 min readJul 30, 2020

Background:

As the pandemic of the COVID-19 virus spread across the world and impacted more and more lives, researchers, doctors, media, and lay-people alike have tried to learn about it. The understanding of its transmission, severity of symptoms and mortality rate have all be under scrutiny. I believe the most frequently asked question would be:

“how would COVID-19 affect my loved ones and I if we were to be infected?

and so arose the prevalence of the “at-risk population” in the conversation of how to handle this virus. The elderly, those with preexisting lung conditions, smokers, and the immune-comprised all became hyper aware of the elevated threat they faced. With all of this in mind, I set out to discover if the risk factors for a population (county) had a relationship to their infection and mortality rates.

Data:

To begin investigating this topic, I needed information. I first collected data on the COVID-19 virus from usafacts.org. Using three datasets, I collected the values for individual county populations, their confirmed cases, and their COVID-19-contributed fatalities. My data regarding risk factors also came from from usafacts.org. This dataframe contained Community Health Status Indicators (CHSI) to combat obesity, heart disease, and cancer. Some of the features within this dataframe included percentages of obesity, lack of exercise, tobacco usage, and fruit and vegetable consumption.

After merging my risk factors with my COVID-19 data, I decided to only use counties with which there was more then 1000 cases of the virus(leaving 504 counties). Shown here are all of those most infected counties in the United States, with darker reds to indicate higher rates of mortality.

Colored counties indicate more then 1000 cases, with darker reds to denote higher mortality rates.

My next step was to test for Pearson correlations between some of the risk factors and the mortality rate of the coronavirus. Interestingly, I found slight inverse correlations between obesity percentage and the percentage consumption of few fruits and vegetables (-.18 and -.25 respectively). This suggests that within the 500 or so most infected areas of the country, those counties with higher obesity averages had lower mortality averages. The same applies for low percentage of fruit and vegetable consumption.

Analysis:

This data may be suggesting against the conventional wisdom of healthy and nutritious people being able to recover from illnesses, but I have a hypothesss as to why this may be. The reasoning behind higher obesity population counties having a lower mortality rate may be based on location. There has been many suggestions that warm weather/UV exposure can kill the COVID-19 virus, and a large percentage of the highest obesity counties in the data are located in the southern(warmer) states. As a result of this, the virus may not have spread as aggressively, there-by not inundating the healthcare system as heavily. The same applies for the percentage of people who eat few fruits and vegetables. The higher half of these data points are overwhelmingly located in southern states. To further this point, reference the visualization of the mortality rates of the US shown above. The highly infected counties in the northern states have higher mortality rates.

Seeing the difference in the mortality rates of northern and southern states, I then wanted to compare their infection rates. Shown here is that visualization.

Colored counties indicate more then 1000 cases, with darker reds to denote higher infection rates.

The infection rates (per population) across the country are seemingly more similar then that of the mortality rate. What could be contributing to this? To investigate, I used my data to test for correlation between total infections and health factors, which also yielded some interesting results. Percentage of obesity, percentage of those with diabetes, and lack of exercise all had increasing correlation coefficients(.19, .23, and .36 respectively) with the infection rate! Additionally, many of these counties that were in the top 50% for these factors were located in southern states. This could help to explain the difference in the geographic distribution of mortality rates versus infection rates.

FYI: the six outlier counties in the obesity graph are all located in southern states

This data is much more representative of widely accepted medical truths then that of the mortality rates correlations. The less obese, more physically active a county is, the less likely its population is to be infected(stay healthy!).

Findings:

The usage of a population’s reported risk factors to predict COVID-19 severity is not entirely effective. Although relations can be drawn with obesity, exercise levels, and consumption of fruits and vegetables, the correlations aren’t so strong that they could not be disputed. The data did however point to a correlation in climate and mortality rates. Variations in population density, healthcare infrastructure, and how the local governments handled quarantine procedures all complicate any research into the subject. Hopefully as more research is completed on the virus, and more accurate, time-tested data is distributed, we can more clearly draw conclusions on just what makes the virus deadly and be better prepared for an outbreak.

Sources:

usafacts.org: COVID-19 spread map

usafacts.org: CHSI datasets

Washington Post: Warm weather curb coronavirus- what the experts say

Visualizations : Github and Code used

My Website : Here

--

--