Analyzing Trends and Determinants of Leading Causes of Death in the USA: A Data-Driven Approach

Saddam Hossain, Mohammed Nazmul Islam Miah, MD Sohel Rana, Md Sazzad Hossain, Proshanta Kumar Bhowmik, Md Khalilor Rahman, Rabeya akter
The exponential escalation of the causes of death and their trends and determinants in the nation greatly define the health landscape of the United States. These causes of death, such as heart disease, cancer, chronic lower respiratory diseases, HIV &AIDS, accidents, and stroke, have been major public health concerns for many decades. Each condition represents broader societal and individual health challenges that include lifestyle choices, environmental factors, genetic predispositions, and healthcare accessibility. This research project aimed to use the data-driven approach in the exploration of these trends to understand the patterns and determinants underpinning mortality statistics. Using an expanded data set, the study presented leading causes of death; the pattern of variation by demographic factors, including age, sex, and race/ethnicity; and social, environmental, and behavioral determinants of those patterns. The datasets for our research project were retrieved from the Kaggle website, namely, "NCHS - Leading Causes of Death: United States" which was very informative regarding the major causes of death in the United States between the years 1999 and 2016. It was organized in such a way that one can analyze the trends; hence, it includes variables such as Cause of Death, such as heart disease and cancer, Year, State, Age-adjusted Death Rate, and Number of Deaths. Other demographic variables, like Sex and Race/Ethnicity, further allowed for even finer subgroups, which were very useful in highlighting disparities in health outcomes. The performances of the three machine learning models, Linear Regression, Random Forest, and XG-Boost, based on Mean Squared Error (MSE) and R-squared (R2) were evaluated. Retrospectively, XG-Boost outperformed the other models significantly for both MSE and R2. This therefore means that on this dataset, XG-Boost is the best model that can be used for the most accurate and reliable prediction. In that respect, advanced machine learning models, applied to mortality trends, provide deep insight into the underlying determinants. Large datasets comprising demographic, socioeconomic, and health-related variables are analyzed for patterns and correlations that may not be obvious in traditional statistical methods. Model predictions can indicate future trends in mortality by highlighting populations at high risk and locations. Data-driven models hold monumental implications in public health through the provision of insights into the trends and determinants of mortality, besides including possible interventions.
55

Просмотров

6

Загрузок

hh-index

1

Цитаты