Data Understanding
Data Understanding
Data Set Info
There are 33417 observations and 34 columns where 5 variables are of categorical data type and the remaining 29 variables are of numerical data types. Looking at the column names, the data set provides us information about total number of COVID cases, tests and deaths by continent and by different age brackets. It also has information about per capita, life expectancy, death rate by cardiovascular and diabetes on a daily basis.
RangeIndex: 33417 entries, 0 to 33416
Data columns (total 34 columns):
iso_code 33353 non-null object
continent 33141 non-null object
location 33417 non-null object
date 33417 non-null object
total_cases 33062 non-null float64
new_cases 33062 non-null float64
total_deaths 33062 non-null float64
new_deaths 33062 non-null float64
total_cases_per_million 32998 non-null float64
new_cases_per_million 32998 non-null float64
total_deaths_per_million 32998 non-null float64
new_deaths_per_million 32998 non-null float64
new_tests 10401 non-null float64
total_tests 10647 non-null float64
total_tests_per_thousand 10647 non-null float64
new_tests_per_thousand 10401 non-null float64
new_tests_smoothed 11520 non-null float64
new_tests_smoothed_per_thousand 11520 non-null float64
tests_units 12288 non-null object
stringency_index 27130 non-null float64
population 33353 non-null float64
population_density 31910 non-null float64
median_age 30074 non-null float64
aged_65_older 29638 non-null float64
aged_70_older 29919 non-null float64
gdp_per_capita 29708 non-null float64
extreme_poverty 19865 non-null float64
cardiovasc_death_rate 30083 non-null float64
diabetes_prevalence 31104 non-null float64
female_smokers 23877 non-null float64
male_smokers 23591 non-null float64
handwashing_facilities 13764 non-null float64
hospital_beds_per_thousand 27353 non-null float64
life_expectancy 32951 non-null float64
dtypes: float64(29), object(5)
memory usage: 8.7+ MB <br/>
Description and Summary Below table shows the description and summary of numerical or continuous variables:
total_cases | new_cases | total_deaths | new_deaths | total_cases_per_million | new_cases_per_million | total_deaths_per_million | new_deaths_per_million | new_tests | total_tests | total_tests_per_thousand | new_tests_per_thousand | new_tests_smoothed | new_tests_smoothed_per_thousand | stringency_index | population | population_density | median_age | aged_65_older | aged_70_older | gdp_per_capita | extreme_poverty | cardiovasc_death_rate | diabetes_prevalence | female_smokers | male_smokers | handwashing_facilities | hospital_beds_per_thousand | life_expectancy | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 3.306200e+04 | 33062.000000 | 33062.000000 | 33062.00000 | 32998.000000 | 32998.000000 | 32998.000000 | 32998.000000 | 10401.000000 | 1.064700e+04 | 10647.000000 | 10401.000000 | 11520.000000 | 11520.000000 | 27130.000000 | 3.335300e+04 | 31910.000000 | 30074.000000 | 29638.000000 | 29919.000000 | 29708.000000 | 19865.000000 | 30083.000000 | 31104.000000 | 23877.000000 | 23591.000000 | 13764.000000 | 27353.000000 | 32951.000000 |
mean | 5.091939e+04 | 1010.762809 | 2655.291634 | 39.93243 | 1103.657007 | 17.858746 | 40.909829 | 0.533204 | 16320.258341 | 7.689958e+05 | 30.980448 | 0.572316 | 15589.503906 | 0.551412 | 58.327987 | 9.443562e+07 | 368.561392 | 31.634754 | 9.450372 | 5.990319 | 21546.066343 | 11.489011 | 249.517591 | 8.039533 | 10.990606 | 32.629508 | 53.246010 | 3.146980 | 74.244388 |
std | 5.180225e+05 | 9309.139517 | 25233.329557 | 347.73264 | 2674.940362 | 62.928423 | 123.250689 | 3.006846 | 59168.420750 | 3.022411e+06 | 55.964699 | 1.104416 | 54168.666654 | 0.979232 | 29.773501 | 6.370159e+08 | 1680.063490 | 9.012636 | 6.375376 | 4.362110 | 20697.420278 | 18.736936 | 117.957827 | 4.116805 | 10.504692 | 13.328649 | 31.456423 | 2.549325 | 7.316460 |
min | 0.000000e+00 | -29726.000000 | 0.000000 | -1918.00000 | 0.000000 | -437.881000 | 0.000000 | -41.023000 | -3743.000000 | 1.000000e+00 | 0.000000 | -0.398000 | 0.000000 | 0.000000 | 0.000000 | 8.090000e+02 | 0.137000 | 15.100000 | 1.144000 | 0.526000 | 661.240000 | 0.100000 | 79.370000 | 0.990000 | 0.100000 | 7.700000 | 1.188000 | 0.100000 | 53.280000 |
25% | 2.100000e+01 | 0.000000 | 0.000000 | 0.00000 | 8.521500 | 0.000000 | 0.000000 | 0.000000 | 805.000000 | 2.585100e+04 | 1.437000 | 0.049000 | 903.000000 | 0.051000 | 37.960000 | 1.701583e+06 | 39.497000 | 24.400000 | 3.607000 | 2.162000 | 6171.884000 | 0.500000 | 153.493000 | 5.310000 | 1.900000 | 21.400000 | 22.863000 | 1.380000 | 70.390000 |
50% | 4.460000e+02 | 5.000000 | 9.000000 | 0.00000 | 155.458000 | 0.773000 | 2.043000 | 0.000000 | 2766.000000 | 1.105140e+05 | 8.105000 | 0.221000 | 3115.000000 | 0.239000 | 67.590000 | 8.655541e+06 | 90.672000 | 31.800000 | 7.104000 | 4.458000 | 15183.616000 | 1.700000 | 235.954000 | 7.110000 | 6.434000 | 31.400000 | 55.182000 | 2.540000 | 75.860000 |
75% | 5.066500e+03 | 102.000000 | 107.000000 | 2.00000 | 936.628000 | 10.572000 | 21.692000 | 0.140000 | 9307.000000 | 4.324700e+05 | 38.056000 | 0.693000 | 9528.250000 | 0.691000 | 81.940000 | 3.236600e+07 | 222.873000 | 39.800000 | 14.864000 | 9.720000 | 33132.320000 | 15.000000 | 318.949000 | 10.080000 | 19.600000 | 40.900000 | 83.741000 | 4.210000 | 80.100000 |
max | 1.670892e+07 | 284710.000000 | 660123.000000 | 10512.00000 | 38138.741000 | 4944.376000 | 1237.551000 | 200.040000 | 929838.000000 | 5.063568e+07 | 638.167000 | 20.611000 | 801014.000000 | 15.456000 | 100.000000 | 7.794799e+09 | 19347.500000 | 48.200000 | 27.049000 | 18.493000 | 116935.600000 | 77.600000 | 724.417000 | 23.360000 | 44.000000 | 78.100000 | 98.999000 | 13.800000 | 86.750000 |
GDP vs Total Deaths
To get an initial understanding we created an lmplot to understand any relation between GDP and total deaths and observed that the total deaths are more for poorer countries but we cannot be very sure, as there is a spike very close to 60,000 which can be deemed as countries with higher GDP. A significant portion of deaths are still in the lesser GDP range which might indicate a disparity between life expectancy between richer and poorer countries.