Conclusion

We tried a regression model based on location details to predict total deaths and got an accuracy of 70.5%. But the evaluation shows a different picture regarding the usefulness of the data set for linear regression. Our decision tree model showed signs of over-fitting in the beginning but once tuned it gave a better accuracy of 91.8% on test set. This is very helpful to take proactive measures at locations which might be prone to be hotspots based on prior trends that we have captured for the same. Moving beyond the data set that we worked with, continuous data input would greatly improve the accuracy of our models over time. The data sets we’re working with are volatile, subject to change as spikes in infections can occur. With more data, the models would be able to predict future outcomes more reliably and help us predict future hotspots, whether by age, GDP or other variables.