Data Understanding

Data Set Info

There are 33417 observations and 34 columns where 5 variables are of categorical data type and the remaining 29 variables are of numerical data types. Looking at the column names, the data set provides us information about total number of COVID cases, tests and deaths by continent and by different age brackets. It also has information about per capita, life expectancy, death rate by cardiovascular and diabetes on a daily basis.

RangeIndex: 33417 entries, 0 to 33416
Data columns (total 34 columns):
iso_code                           33353 non-null object
continent                          33141 non-null object
location                           33417 non-null object
date                               33417 non-null object
total_cases                        33062 non-null float64
new_cases                          33062 non-null float64
total_deaths                       33062 non-null float64
new_deaths                         33062 non-null float64
total_cases_per_million            32998 non-null float64
new_cases_per_million              32998 non-null float64
total_deaths_per_million           32998 non-null float64
new_deaths_per_million             32998 non-null float64
new_tests                          10401 non-null float64
total_tests                        10647 non-null float64
total_tests_per_thousand           10647 non-null float64
new_tests_per_thousand             10401 non-null float64
new_tests_smoothed                 11520 non-null float64
new_tests_smoothed_per_thousand    11520 non-null float64
tests_units                        12288 non-null object
stringency_index                   27130 non-null float64
population                         33353 non-null float64
population_density                 31910 non-null float64
median_age                         30074 non-null float64
aged_65_older                      29638 non-null float64
aged_70_older                      29919 non-null float64
gdp_per_capita                     29708 non-null float64
extreme_poverty                    19865 non-null float64
cardiovasc_death_rate              30083 non-null float64
diabetes_prevalence                31104 non-null float64
female_smokers                     23877 non-null float64
male_smokers                       23591 non-null float64
handwashing_facilities             13764 non-null float64
hospital_beds_per_thousand         27353 non-null float64
life_expectancy                    32951 non-null float64
dtypes: float64(29), object(5)
memory usage: 8.7+ MB <br/>

Description and Summary Below table shows the description and summary of numerical or continuous variables:

total_cases new_cases total_deaths new_deaths total_cases_per_million new_cases_per_million total_deaths_per_million new_deaths_per_million new_tests total_tests total_tests_per_thousand new_tests_per_thousand new_tests_smoothed new_tests_smoothed_per_thousand stringency_index population population_density median_age aged_65_older aged_70_older gdp_per_capita extreme_poverty cardiovasc_death_rate diabetes_prevalence female_smokers male_smokers handwashing_facilities hospital_beds_per_thousand life_expectancy
count 3.306200e+04 33062.000000 33062.000000 33062.00000 32998.000000 32998.000000 32998.000000 32998.000000 10401.000000 1.064700e+04 10647.000000 10401.000000 11520.000000 11520.000000 27130.000000 3.335300e+04 31910.000000 30074.000000 29638.000000 29919.000000 29708.000000 19865.000000 30083.000000 31104.000000 23877.000000 23591.000000 13764.000000 27353.000000 32951.000000
mean 5.091939e+04 1010.762809 2655.291634 39.93243 1103.657007 17.858746 40.909829 0.533204 16320.258341 7.689958e+05 30.980448 0.572316 15589.503906 0.551412 58.327987 9.443562e+07 368.561392 31.634754 9.450372 5.990319 21546.066343 11.489011 249.517591 8.039533 10.990606 32.629508 53.246010 3.146980 74.244388
std 5.180225e+05 9309.139517 25233.329557 347.73264 2674.940362 62.928423 123.250689 3.006846 59168.420750 3.022411e+06 55.964699 1.104416 54168.666654 0.979232 29.773501 6.370159e+08 1680.063490 9.012636 6.375376 4.362110 20697.420278 18.736936 117.957827 4.116805 10.504692 13.328649 31.456423 2.549325 7.316460
min 0.000000e+00 -29726.000000 0.000000 -1918.00000 0.000000 -437.881000 0.000000 -41.023000 -3743.000000 1.000000e+00 0.000000 -0.398000 0.000000 0.000000 0.000000 8.090000e+02 0.137000 15.100000 1.144000 0.526000 661.240000 0.100000 79.370000 0.990000 0.100000 7.700000 1.188000 0.100000 53.280000
25% 2.100000e+01 0.000000 0.000000 0.00000 8.521500 0.000000 0.000000 0.000000 805.000000 2.585100e+04 1.437000 0.049000 903.000000 0.051000 37.960000 1.701583e+06 39.497000 24.400000 3.607000 2.162000 6171.884000 0.500000 153.493000 5.310000 1.900000 21.400000 22.863000 1.380000 70.390000
50% 4.460000e+02 5.000000 9.000000 0.00000 155.458000 0.773000 2.043000 0.000000 2766.000000 1.105140e+05 8.105000 0.221000 3115.000000 0.239000 67.590000 8.655541e+06 90.672000 31.800000 7.104000 4.458000 15183.616000 1.700000 235.954000 7.110000 6.434000 31.400000 55.182000 2.540000 75.860000
75% 5.066500e+03 102.000000 107.000000 2.00000 936.628000 10.572000 21.692000 0.140000 9307.000000 4.324700e+05 38.056000 0.693000 9528.250000 0.691000 81.940000 3.236600e+07 222.873000 39.800000 14.864000 9.720000 33132.320000 15.000000 318.949000 10.080000 19.600000 40.900000 83.741000 4.210000 80.100000
max 1.670892e+07 284710.000000 660123.000000 10512.00000 38138.741000 4944.376000 1237.551000 200.040000 929838.000000 5.063568e+07 638.167000 20.611000 801014.000000 15.456000 100.000000 7.794799e+09 19347.500000 48.200000 27.049000 18.493000 116935.600000 77.600000 724.417000 23.360000 44.000000 78.100000 98.999000 13.800000 86.750000

GDP vs Total Deaths

To get an initial understanding we created an lmplot to understand any relation between GDP and total deaths and observed that the total deaths are more for poorer countries but we cannot be very sure, as there is a spike very close to 60,000 which can be deemed as countries with higher GDP. A significant portion of deaths are still in the lesser GDP range which might indicate a disparity between life expectancy between richer and poorer countries. Lmplot for GDP vs Total Deaths