Approach

Approach

Below is the Approach we took for this project:

Data understanding and EDA
- Data type, data range constraints
- Looking for missing data
- Looking for inconsistencies in data
- Visualizing the above to get a better insight in the uniformity or the inconsistencies
- Understanding and comparing distribution through histograms, CDF, PMF and KDE plots
- Visualizing relationships between the variables

Data Preparation
- Handling missing data with imputation or deletion (preferably imputation)
- Handling data types by converting some if needed
- Handling the scaling and normalizing of the data
- Dealing with pre-processing feature engineering: encoding, handling numerical and date variables
- Feature selection
  - Removing the redundancy
  - Checking correlation via a matrix and plot to understand which variables to select
  - If necessary dimension reduction by PCA

Machine Learning (Supervised)
- Supervised learning models to understand the effect of GDP and predict deaths

Evaluation - Splitting the data into train and test set - Scores, classification reports, confusion matrix to understand the accuracy of classification predictions - Exploring this with cross validation as well - Tuning the hyperparameter and also planning to apply grid search for such hyper parameter tuning