The purpose of this project is to explore relationships among mental and physical health issues, and predictors of anxiety and depression among college students. While mental health is sometimes hard to measure, there are standardized methods developed to help psychologists evaluate the presence and severity of certain mental health indicators. In this project, we will focus mainly on depression and anxiety disorders in relation to overall mental health.
Logistic regression models will be used to analyze the probability of the presence of a depression or anxiety diagnosis. A variable from the NCHA survey asks particpants if they have been diagnosed with depression (acha_depression). There is no standard value for anxiety diagnosis in the survey, so GAD-7 cutoff scores of <= 10 are coded as having anxiety, and lower are coded as not. Ten is the standard score for the GAD-7 where clinical intervention is required.
Our physical and mental health are not distinct from each other. It is important to view health from an overall perspective, because all areas of our lives cross over into each other. College students are a vulnerable population for mental health issues. When we understand which factors are related to their prevalence, we can work to create better prevention strategies and know who to focus our attention on. Depression and anxiety are not purely mental, but present themselves through physical problems like fatigue, high heart rate, inflammation, and other biological symptoms.
Data Description
The data set comes from a study based on social media’s affects on college students’ mental health (Braghieri, et. al 2021). The data includes variables with information from PHQ-9 Depression Screening Surveys and GAD-7 Anxiety Screening Surveys. Both are highly reliable and valid in medical diagnoses. Additionally, there are survey questions derived from the American College Health Association (ACHA)’s National College Health Assessment (NCHA). This data set was originally used to evaluate the validity of the NCHA by comparing the outcomes to the PHQ-9 and the GAD-7, both of which were highly correlated to the poor mental health predictors of the NCHA. There are 509 observations.
Rows: 509
Columns: 89
$ RecordedDate <chr> "1/29/2022 15:37", "1/29/2022 15:4…
$ year_1 <dbl> 2000, 1997, 2000, 2001, 2000, 2000…
$ state_1 <chr> "Virginia", "California", "Marylan…
$ surveys <chr> "15", "15", "1", "7", "5", "30", "…
$ general_health <chr> "Good", "Excellent", "Very Good", …
$ phq9_interest <chr> "More than half of the days", "Not…
$ phq9_depressed <chr> "Several days", "Not at all", "Sev…
$ phq9_sleep <chr> "Several days", "Not at all", "Not…
$ phq9_tired <chr> "Several days", "Several days", "S…
$ phq9_appetite <chr> "Several days", "Several days", "N…
$ phq9_failure <chr> "More than half of the days", "Not…
$ phq9_concentrating <chr> "Several days", "Not at all", "Mor…
$ phq9_speed <chr> "Not at all", "Not at all", "Not a…
$ phq9_selfharm <chr> "Not at all", "Not at all", "Sever…
$ gad7_anxious <chr> "Several days", "Not at all", "Sev…
$ gad7_control <chr> "Several days", "Not at all", "Not…
$ gad7_worrying <chr> "Several days", "Not at all", "Not…
$ gad7_relaxing <chr> "Several days", "Not at all", "Not…
$ gad7_restless <chr> "Several days", "Not at all", "Not…
$ gad7_annoyed <chr> "Several days", "Not at all", "Mor…
$ gad7_afraid <chr> "Several days", "Not at all", "Sev…
$ acha_12months_times_hopeless <chr> "5-6 times", "Never", "1-2 times",…
$ acha_12months_times_overwhelmed <chr> "5-6 times", "11 or more times", "…
$ acha_12months_times_exhausted <chr> "5-6 times", "5-6 times", "7-8 tim…
$ acha_12months_times_sad <chr> "5-6 times", "3-4 times", "1-2 tim…
$ acha_12months_times_depressed <chr> "1-2 times", "Never", "1-2 times",…
$ acha_12months_times_considerSuicide <chr> "Never", "Never", "1-2 times", "3-…
$ acha_12months_times_attemptSuicide <chr> "Never", "Never", "Never", "Never"…
$ acha_12months_any_allergy <chr> "No", "No", "No", "Yes", "No", "No…
$ acha_12months_any_anorexia <chr> "No", "No", "No", "No", "No", "No"…
$ acha_12months_any_anxiety <chr> "Yes", "No", "No", "Yes", "Yes", "…
$ acha_12months_any_asthma <chr> "No", "No", "No", "No", "No", "No"…
$ acha_12months_any_bulimia <chr> "No", "No", "No", "No", "No", "No"…
$ acha_12months_any_fatigure <chr> "No", "No", "No", "No", "No", "No"…
$ acha_12months_any_depression <chr> "Yes", "No", "No", "Yes", "Yes", "…
$ acha_12months_any_diabetes <chr> "No", "No", "No", "No", "Yes", "No…
$ acha_12months_any_endometriosi <chr> "No", "No", "No", "Yes", "No", "No…
$ acha_12months_any_herpes <chr> "No", "No", "No", "No", "No", "No"…
$ acha_12months_any_hpv <chr> "No", "No", "No", "No", "No", "No"…
$ acha_12months_any_hepatitis <chr> "No", "No", "No", "No", "No", "No"…
$ acha_12months_any_blood <chr> "No", "No", "No", "No", "No", "No"…
$ acha_12months_any_cholesterol <chr> "No", "No", "No", "No", "No", "No"…
$ acha_12months_any_HIV <chr> "No", "No", "No", "No", "No", "No"…
$ acha_12months_any_stressInjury <chr> "No", "No", "No", "No", "No", "No"…
$ acha_12months_any_seasonal <chr> "Yes", "No", "No", "No", "No", "No…
$ acha_12months_any_substance <chr> "No", "Yes", "No", "No", "No", "No…
$ acha_12months_any_back <chr> "No", "Yes", "No", "No", "No", "Ye…
$ acha_12months_any_fracture <chr> "No", "No", "No", "No", "No", "No"…
$ acha_12months_any_bronchitis <chr> "No", "No", "No", "No", "No", "No"…
$ acha_12months_any_chlamydia <chr> "No", "No", "No", "No", "No", "No"…
$ acha_12months_any_ear <chr> "No", "No", "No", "No", "No", "No"…
$ acha_12months_any_gonorrhea <chr> "No", "No", "No", "No", "No", "No"…
$ acha_12months_any_mononucleosis <chr> "No", "No", "No", "No", "No", "No"…
$ acha_12months_any_pelvic <chr> "No", "No", "No", "No", "No", "No"…
$ acha_12months_any_sinus <chr> "No", "No", "No", "No", "Yes", "Ye…
$ acha_12months_any_strep <chr> "No", "No", "No", "Yes", "No", "No…
$ acha_12months_any_tuberculosis <chr> "No", "No", "No", "No", "No", "No"…
$ acha_services_dianosed <chr> "No", NA, "No", "Yes", "No", NA, "…
$ acha_services_therapy <chr> "Yes", NA, "No", "Yes", "Yes", NA,…
$ acha_services_medication <chr> "Yes", NA, "No", "Yes", "Yes", NA,…
$ acha_depression <chr> "Yes", "No", "No", "Yes", "Yes", "…
$ sex <chr> "Female", "Female", "Female", "Fem…
$ fulltime <chr> "Yes", "Yes", "Yes", "Yes", "Yes",…
$ international <chr> "No", "No", "No", "No", "No", "No"…
$ race <fct> white, asian, asian, white, white,…
$ phq9_interest1 <dbl> 2, 0, 1, 2, 2, 1, 1, 1, 1, 1, 0, 1…
$ phq9_depressed1 <dbl> 1, 0, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1…
$ phq9_sleep1 <dbl> 1, 0, 0, 3, 2, 1, 2, 3, 3, 0, 1, 0…
$ phq9_tired1 <dbl> 1, 1, 1, 3, 2, 2, 2, 1, 3, 2, 1, 2…
$ phq9_appetite1 <dbl> 1, 1, 0, 3, 1, 1, 1, 3, 0, 1, 1, 1…
$ phq9_failure1 <dbl> 2, 0, 0, 1, 2, 2, 1, 2, 1, 0, 1, 0…
$ phq9_concentrating1 <dbl> 1, 0, 2, 3, 1, 2, 1, 1, 1, 0, 1, 0…
$ phq9_speed1 <dbl> 0, 0, 0, 1, 0, 0, 1, 2, 1, 0, 0, 0…
$ phq9_selfharm1 <dbl> 0, 0, 1, 1, 2, 0, 0, 1, 0, 0, 0, 0…
$ phq9_score <dbl> 9, 2, 6, 19, 14, 10, 10, 15, 11, 5…
$ phq9_severity <chr> "Mild", "None-minimal", "Mild", "M…
$ gad7_anxious1 <dbl> 1, 0, 1, 1, 1, 1, 2, 2, 1, 3, 2, 3…
$ gad7_control1 <dbl> 1, 0, 0, 0, 1, 2, 2, 1, 1, 3, 1, 3…
$ gad7_worrying1 <dbl> 1, 0, 0, 1, 1, 2, 2, 2, 1, 3, 2, 3…
$ gad7_relaxing1 <dbl> 1, 0, 0, 2, 1, 1, 2, 1, 2, 3, 2, 3…
$ gad7_restless1 <dbl> 1, 0, 0, 2, 1, 0, 2, 1, 2, 2, 1, 1…
$ gad7_annoyed1 <dbl> 1, 0, 2, 2, 2, 3, 2, 2, 0, 3, 1, 3…
$ gad7_afraid1 <dbl> 1, 0, 1, 0, 1, 0, 2, 1, 0, 1, 0, 1…
$ gad7_score <dbl> 7, 0, 4, 8, 8, 9, 14, 10, 7, 18, 9…
$ gad7_severity <chr> "Mild Anxiety", "Minimal Anxiety",…
$ gad7_anxiety <chr> "No", "No", "No", "No", "No", "No"…
$ acha_services_diagnosed1 <chr> "No", NA, NA, "Yes", "No", NA, "No…
$ acha_services_therapy1 <chr> "Yes", NA, NA, "Yes", "Yes", NA, "…
$ acha_services_medication1 <chr> "Yes", NA, NA, "Yes", "Yes", NA, "…
The PHQ-9 is a standardized survey used to screen and diagnose depression. Participants are asked how often they have been bothered by nine specific problems over the past two weeks and respond with one of the four answers:
The nine prompts consist of the following:
Surveys are then scored and indicate levels of depression based on this scale:
Similar to the PHQ-9, the GAD-7 another is standardized survey, but it is used to screen generalized anxiety disorder. Participants are asked how often they have been bothered by seven specific problems over the past two weeks and respond with one of the four answers:
The nine prompts consist of the following:
Surveys are then scored and indicate levels of anxiety based on this scale:
The National College Health Assessment is a semi-annual survey administered to college students by the ACHA. The current data utilizes the survey questions, but not the data collected from the ACHA.
The data includes the following prompts from the survey:
Depression Symptoms:
Possible Responses:
General Health Indicators: Reported any of the following in the past 12 months:
Possible Responses:
Diagnosed with depression? (Yes/No)
If yes,
Possible Responses:
Many variables that were not useful for the research questions and were removed from the data set. Most of these include timestamps from clicks in the survey and other online browser information.
Other variables were created and transformed for ease of use. For example, race was condensed into one category with information for all races as opposed to 6 different variables with two levels.
Additionally, variables were created to represent the overall depression and anxiety scores from the PHQ-9 and GAD-7 screening results.
Discussion
Figure 1
Figure 1 shows the distribution of depression severity based on the PHQ-9. Most students fall under the Mild category, with less having more severe scores.
Figure 2
Figure 2 reports the number of students professionally diagnosed with depression. These values will be used for the logistic regression analysis. Most students are not diagnosed with depression.
Figure 3
Figure 3 shows the distribution of anxiety severity based on the GAD-7. The results are similar to the distribution of depression, with most students exhibiting symptoms equivalent to mild anxiety and less facing severe symptoms.
Figure 4
In order to perform a binary logistic regression, we divided the anxiety severity measures into “Yes” and “No”, similar to the depression diagnoses. The general cutoff for clinical intervention effort are scores of >= 10 (Moderate and Severe Anxiety). A majority of students do not fall under the more severe anxiety measure.
Figure 5
Figure 5 shows how many male and female students are included in the dataset. There are approximately 350 females and 150 males.
Figure 6
Most students identify as white, with almost 300 students reporting so. All students who idenfied as more than one race are classified as “other”.
Figure 7
Most students were born in the year 2000, with a long tail representing multiple older students also represented in the sample. Note: This data was collected in January 2022, so most students born in 2000 are around 21 years old.
Figure 8
This graph shows the amount of people who responded “yes” to experiencing each of the symptoms/disorders listed in the past 12 months. Anxiety, depression, allergies, and back pain are among the most common.
It is worth noting these measures of depression and anxiety are different from the PHQ-9 and GAD-7 scales, and were an independent, self-reported question in the NCHA questionnaire.
Corresponding Graphs
There are many health-related variables within the data set. To identify the best model, we created three separate preliminary models to better understand each predictor’s significance. The first model uses variables related to physical health issues:
The second relates to behavioral health related issues. These were determined by mental health related issues (anxiety, bulimia, seasonal affective disorder, etc.) and sexually transmitted diseases (genital herpes, chlamydia, gonorrhea). The symptoms were divided in this way to make initial models that were similar in predictor numbers and get a general perspective of their significance.
All models also include sex, race, and age because we are interested in the demographic effects despite their significance.
Note: Tuberculosis was not included in the models because there were too few cases and did not let the model run appropriately.
The third model combines all predictors from the first two models to explore additional possible significant variables.
Finally, our refined model is composed of every predictor that was significant at the 0.1 level in the first three models. More information on the refined model can be found in the tabs “Refined Model Interpretation,” “Goodness of Fit & Adequacy,” “Predictive Performance,” and “ROC.”
We fit a binary logistic regression model predicting the presence of a depression diagnosis (acha_depression_01) from the following variables: sex, race, age(year_1), general health, GAD-7 score, and any experience in the last 12 months of asthma, diabetes, repetitive stress injury, anxiety, depression, ear infection, or stress fracture.
The model was stratified into a training set of 70% and a test set of 30%. Due to imbalances, the training set was upsampled to better train the model. Original counts for depression diagnoses were 272 “No” and 136 “Yes” in the training set. The seed was set at 2626 for consistent split data. Details on data splitting and up-sampling can be found in the source code.
The following predictors were significant in the refined model:
race(Asian): OR = 0.21 (0.21 - 1 = -0.79 ~ -79%)
race(Black): OR = 0.16 (0.16 - 1 = -0.84 ~ -84%)
race(Hispanic): OR = 0.39 (0.39 - 1 = -0.61 ~ -61%)
race(other): OR = 0.32 (0.32 - 1 = -0.68 ~ -68%)
age (year_1): OR = 0.90 (0.90 - 1 = -0.10 ~ -10%)
GAD-7 Score: OR = 0.92 (0.92 - 1 = -0.08 ~ -8%)
anxiety(12 months): OR = 5.01 ~ 501%
depression(12 months): OR = 30.64 ~ 3064%
Using a 0.5 probability cutoff on the test data, the model achieved accuracy of 89.1%, with 88.2% sensitivity and 89.6% specificity. The ROC curve yielded an AUC of 0.931, showing excellent discrimination between students diagnosed and not diagnosed with depression.
Below is the first model including physical health measures and demographics.
Call:
glm(formula = acha_depression_01 ~ sex + race + year_1 + general_health +
acha_12months_any_allergy + acha_12months_any_asthma + acha_12months_any_fatigure +
acha_12months_any_diabetes + acha_12months_any_endometriosi +
acha_12months_any_hepatitis + acha_12months_any_blood + acha_12months_any_cholesterol +
acha_12months_any_stressInjury + acha_12months_any_back +
acha_12months_any_fracture + acha_12months_any_bronchitis +
acha_12months_any_ear + acha_12months_any_pelvic + acha_12months_any_sinus +
acha_12months_any_strep, family = binomial, data = depressiontrain_up)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 126.98147 44.76870 2.836 0.00456 **
sexMale -0.22559 0.23348 -0.966 0.33394
raceasian -1.68283 0.32841 -5.124 2.99e-07 ***
raceblack -1.05259 0.42668 -2.467 0.01363 *
racehispanic -0.65869 0.33196 -1.984 0.04723 *
raceother -0.84856 0.41597 -2.040 0.04136 *
year_1 -0.06411 0.02238 -2.864 0.00418 **
general_healthFair 1.64958 0.69837 2.362 0.01817 *
general_healthGood 1.48784 0.66733 2.230 0.02578 *
general_healthPoor 2.63316 0.81546 3.229 0.00124 **
general_healthVery Good 1.01236 0.67622 1.497 0.13437
acha_12months_any_allergyYes -0.00434 0.21276 -0.020 0.98372
acha_12months_any_asthmaYes 0.58870 0.29230 2.014 0.04401 *
acha_12months_any_fatigureYes 0.49772 0.42142 1.181 0.23758
acha_12months_any_diabetesYes 2.57676 1.10248 2.337 0.01943 *
acha_12months_any_endometriosiYes 0.29557 0.86797 0.341 0.73346
acha_12months_any_hepatitisYes 1.07398 1.71512 0.626 0.53120
acha_12months_any_bloodYes -0.34108 0.47794 -0.714 0.47545
acha_12months_any_cholesterolYes -0.08588 0.49403 -0.174 0.86199
acha_12months_any_stressInjuryYes -1.17170 0.57794 -2.027 0.04262 *
acha_12months_any_backYes 0.16042 0.21315 0.753 0.45169
acha_12months_any_fractureYes 1.80524 0.86465 2.088 0.03681 *
acha_12months_any_bronchitisYes -0.69843 0.66608 -1.049 0.29437
acha_12months_any_earYes 0.54421 0.47354 1.149 0.25045
acha_12months_any_pelvicYes -1.09028 1.79123 -0.609 0.54274
acha_12months_any_sinusYes 0.06461 0.26599 0.243 0.80808
acha_12months_any_strepYes -0.13324 0.41545 -0.321 0.74842
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 754.14 on 543 degrees of freedom
Residual deviance: 623.97 on 517 degrees of freedom
AIC: 677.97
Number of Fisher Scoring iterations: 5
Race, age, general health, asthma(12 months), diabetes(12 months), stress injury(12 months), and fracture(12 months) were significant and will be included in the refined model.
Below is the second model including measures of behavioral health.
Call:
glm(formula = acha_depression_01 ~ sex + race + year_1 + general_health +
gad7_score + acha_12months_any_anorexia + acha_12months_any_anxiety +
acha_12months_any_bulimia + acha_12months_any_depression +
acha_12months_any_herpes + acha_12months_any_hpv + acha_12months_any_seasonal +
acha_12months_any_substance + acha_12months_any_chlamydia +
acha_12months_any_gonorrhea + acha_12months_any_mononucleosis,
family = binomial, data = depressiontrain_up)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 221.96285 57.29311 3.874 0.000107 ***
sexMale 0.24620 0.35017 0.703 0.481997
raceasian -1.65736 0.47473 -3.491 0.000481 ***
raceblack -1.77323 0.52812 -3.358 0.000786 ***
racehispanic -0.73561 0.45063 -1.632 0.102593
raceother -1.03105 0.58188 -1.772 0.076404 .
year_1 -0.11238 0.02868 -3.919 8.89e-05 ***
general_healthFair 1.06186 0.91223 1.164 0.244415
general_healthGood 0.41479 0.88230 0.470 0.638262
general_healthPoor 1.65319 1.05377 1.569 0.116687
general_healthVery Good 0.21631 0.88988 0.243 0.807946
gad7_score -0.08296 0.03192 -2.599 0.009347 **
acha_12months_any_anorexiaYes 0.27516 0.50049 0.550 0.582474
acha_12months_any_anxietyYes 1.99686 0.40406 4.942 7.73e-07 ***
acha_12months_any_bulimiaYes -0.56483 0.66626 -0.848 0.396564
acha_12months_any_depressionYes 3.36324 0.32900 10.223 < 2e-16 ***
acha_12months_any_herpesYes 0.47879 1.53048 0.313 0.754405
acha_12months_any_hpvYes 16.77736 759.55930 0.022 0.982378
acha_12months_any_seasonalYes 0.18015 0.36353 0.496 0.620204
acha_12months_any_substanceYes 0.22091 0.57167 0.386 0.699179
acha_12months_any_chlamydiaYes 0.64445 1.59335 0.404 0.685875
acha_12months_any_gonorrheaYes 18.32084 1007.88316 0.018 0.985497
acha_12months_any_mononucleosisYes -1.34819 1.71384 -0.787 0.431489
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 754.14 on 543 degrees of freedom
Residual deviance: 346.62 on 521 degrees of freedom
AIC: 392.62
Number of Fisher Scoring iterations: 15
Race, age, asthma(12 months), GAD-7 score, anxiety(12 months), and depression(12 months) were significant and will be included in the refined model.
Below is the third model that includes all predictors from the first and second models.
Call:
glm(formula = acha_depression_01 ~ sex + race + year_1 + general_health +
gad7_score + acha_12months_any_allergy + acha_12months_any_asthma +
acha_12months_any_fatigure + acha_12months_any_diabetes +
acha_12months_any_endometriosi + acha_12months_any_hepatitis +
acha_12months_any_blood + acha_12months_any_cholesterol +
acha_12months_any_stressInjury + acha_12months_any_back +
acha_12months_any_fracture + acha_12months_any_bronchitis +
acha_12months_any_ear + acha_12months_any_pelvic + acha_12months_any_sinus +
acha_12months_any_strep + gad7_score + acha_12months_any_anorexia +
acha_12months_any_anxiety + acha_12months_any_bulimia + acha_12months_any_depression +
acha_12months_any_herpes + acha_12months_any_hpv + acha_12months_any_seasonal +
acha_12months_any_substance + acha_12months_any_chlamydia +
acha_12months_any_gonorrhea + acha_12months_any_mononucleosis,
family = binomial, data = depressiontrain_up)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 246.35783 63.38758 3.887 0.000102 ***
sexMale 0.54214 0.39094 1.387 0.165512
raceasian -2.19224 0.52982 -4.138 3.51e-05 ***
raceblack -1.86375 0.62424 -2.986 0.002830 **
racehispanic -1.09800 0.50507 -2.174 0.029708 *
raceother -0.54818 0.64341 -0.852 0.394223
year_1 -0.12463 0.03175 -3.925 8.67e-05 ***
general_healthFair 1.22254 0.98782 1.238 0.215860
general_healthGood 0.54685 0.93503 0.585 0.558654
general_healthPoor 1.85089 1.13245 1.634 0.102171
general_healthVery Good 0.32820 0.94364 0.348 0.727990
gad7_score -0.08230 0.03507 -2.347 0.018942 *
acha_12months_any_allergyYes -0.27616 0.32660 -0.846 0.397804
acha_12months_any_asthmaYes 0.03589 0.42549 0.084 0.932772
acha_12months_any_fatigureYes -0.69749 0.59623 -1.170 0.242068
acha_12months_any_diabetesYes 1.92802 1.41992 1.358 0.174516
acha_12months_any_endometriosiYes 1.00118 1.39138 0.720 0.471795
acha_12months_any_hepatitisYes 2.07348 2.44280 0.849 0.395986
acha_12months_any_bloodYes -0.82205 0.68180 -1.206 0.227935
acha_12months_any_cholesterolYes -0.92131 0.73127 -1.260 0.207713
acha_12months_any_stressInjuryYes -0.80547 0.77057 -1.045 0.295890
acha_12months_any_backYes -0.17051 0.32182 -0.530 0.596222
acha_12months_any_fractureYes 1.34395 1.70088 0.790 0.429442
acha_12months_any_bronchitisYes -0.94609 0.86303 -1.096 0.272974
acha_12months_any_earYes 1.45513 0.67787 2.147 0.031823 *
acha_12months_any_pelvicYes -1.55127 2.57557 -0.602 0.546973
acha_12months_any_sinusYes -0.62055 0.39666 -1.564 0.117710
acha_12months_any_strepYes 0.18657 0.62337 0.299 0.764714
acha_12months_any_anorexiaYes -0.09129 0.51714 -0.177 0.859885
acha_12months_any_anxietyYes 2.23228 0.46439 4.807 1.53e-06 ***
acha_12months_any_bulimiaYes -0.14823 0.72926 -0.203 0.838931
acha_12months_any_depressionYes 3.66500 0.36537 10.031 < 2e-16 ***
acha_12months_any_herpesYes 0.96951 2.16636 0.448 0.654491
acha_12months_any_hpvYes 17.33741 814.72722 0.021 0.983022
acha_12months_any_seasonalYes 0.50099 0.40260 1.244 0.213354
acha_12months_any_substanceYes 0.06820 0.62984 0.108 0.913772
acha_12months_any_chlamydiaYes 0.03102 1.69548 0.018 0.985403
acha_12months_any_gonorrheaYes 18.21768 1029.57824 0.018 0.985883
acha_12months_any_mononucleosisYes -2.66055 3.82107 -0.696 0.486251
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 754.14 on 543 degrees of freedom
Residual deviance: 320.99 on 505 degrees of freedom
AIC: 398.99
Number of Fisher Scoring iterations: 15
Race, age, GAD-7 score, ear infection(12 months), anxiety(12 months), and depression(12 months) were significant and will be included in the refined model.
Call:
glm(formula = acha_depression_01 ~ sex + race + year_1 + general_health +
gad7_score + acha_12months_any_asthma + acha_12months_any_diabetes +
acha_12months_any_stressInjury + acha_12months_any_anxiety +
acha_12months_any_depression + acha_12months_any_ear + acha_12months_any_fracture,
family = binomial, data = depressiontrain_up)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 207.76368 56.24814 3.694 0.000221 ***
sexMale 0.06712 0.33227 0.202 0.839905
raceasian -1.54596 0.43904 -3.521 0.000430 ***
raceblack -1.85949 0.51716 -3.596 0.000324 ***
racehispanic -0.95081 0.45747 -2.078 0.037671 *
raceother -1.15429 0.53924 -2.141 0.032307 *
year_1 -0.10503 0.02814 -3.733 0.000190 ***
general_healthFair 0.77356 0.88960 0.870 0.384544
general_healthGood 0.45918 0.84731 0.542 0.587872
general_healthPoor 1.49798 1.00733 1.487 0.136995
general_healthVery Good 0.25129 0.85218 0.295 0.768084
gad7_score -0.08537 0.03129 -2.729 0.006360 **
acha_12months_any_asthmaYes -0.06061 0.38222 -0.159 0.874003
acha_12months_any_diabetesYes 1.59667 1.27043 1.257 0.208828
acha_12months_any_stressInjuryYes -1.04472 0.70555 -1.481 0.138681
acha_12months_any_anxietyYes 1.61268 0.38020 4.242 2.22e-05 ***
acha_12months_any_depressionYes 3.42231 0.32147 10.646 < 2e-16 ***
acha_12months_any_earYes 0.88965 0.59078 1.506 0.132095
acha_12months_any_fractureYes 1.15933 1.33747 0.867 0.386047
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 754.14 on 543 degrees of freedom
Residual deviance: 367.23 on 525 degrees of freedom
AIC: 405.23
Number of Fisher Scoring iterations: 6
Below are ORs and confidence intervals for all predictors. Specific interpretations can be found in the “Refined Model Interpretation” tab.
OR 2.5 % 97.5 %
(Intercept) 1.700669e+90 1.885469e+43 3.549057e+139
sexMale 1.069427e+00 5.589548e-01 2.064332e+00
raceasian 2.131081e-01 8.817085e-02 4.956930e-01
raceblack 1.557524e-01 5.532005e-02 4.235715e-01
racehispanic 3.864273e-01 1.572086e-01 9.489300e-01
raceother 3.152801e-01 1.093355e-01 9.117146e-01
year_1 9.002962e-01 8.505711e-01 9.503264e-01
general_healthFair 2.167460e+00 4.352621e-01 1.540431e+01
general_healthGood 1.582770e+00 3.466592e-01 1.045415e+01
general_healthPoor 4.472644e+00 6.950730e-01 3.864480e+01
general_healthVery Good 1.285687e+00 2.782444e-01 8.552239e+00
gad7_score 9.181710e-01 8.624518e-01 9.753187e-01
acha_12months_any_asthmaYes 9.411896e-01 4.478308e-01 2.012452e+00
acha_12months_any_diabetesYes 4.936547e+00 6.033066e-01 1.194615e+02
acha_12months_any_stressInjuryYes 3.517900e-01 8.403254e-02 1.358559e+00
acha_12months_any_anxietyYes 5.016241e+00 2.410885e+00 1.075529e+01
acha_12months_any_depressionYes 3.064015e+01 1.671384e+01 5.917981e+01
acha_12months_any_earYes 2.434283e+00 7.725868e-01 7.906490e+00
acha_12months_any_fractureYes 3.187789e+00 3.300838e-01 5.092231e+01
Analysis of Deviance Table
Model 1: acha_depression_01 ~ 1
Model 2: acha_depression_01 ~ sex + race + year_1 + general_health + gad7_score +
acha_12months_any_asthma + acha_12months_any_diabetes + acha_12months_any_stressInjury +
acha_12months_any_anxiety + acha_12months_any_depression +
acha_12months_any_ear + acha_12months_any_fracture
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 543 754.14
2 525 367.23 18 386.91 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Our refined model depicts the probability of depression using the chosen predictors better than using the null-model. With a p-value of < 2.2 x 10^-16, we can conclude the that our model with the included variables provides substantial explanatory power for predicting depression diagnoses.
Pseudo-R^2fitting null model for pseudo-r2
llh llhNull G2 McFadden r2ML r2CU
-183.6151184 -377.0720662 386.9138957 0.5130503 0.5089645 0.6786193
The McFadden value is greater than 0.4 at 0.51, indicating an excellent fit compared to the null-model. The Cox & Snell value is 0.51, and the Nagelkerke value is 0.68, so the model achieves 68% of the maximum possible improvement in fit compared to the null. All three of these values indicate good fit of the model.
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 60 4
1 7 30
Accuracy : 0.8911
95% CI : (0.8135, 0.9444)
No Information Rate : 0.6634
P-Value [Acc > NIR] : 1.18e-07
Kappa : 0.7613
Mcnemar's Test P-Value : 0.5465
Sensitivity : 0.8824
Specificity : 0.8955
Pos Pred Value : 0.8108
Neg Pred Value : 0.9375
Prevalence : 0.3366
Detection Rate : 0.2970
Detection Prevalence : 0.3663
Balanced Accuracy : 0.8889
'Positive' Class : 1
Using a 0.5 probability cutoff on the test data, the model achieved accuracy of 89.1%, with 88.2% sensitivity and 89.6% specificity. The model successfully predicts the outcome of depression diagnoses.
Similarly to the depression models, three initial models were created to assess the prediction quality of almost all of the same variables from the previous section.
Again, this first model uses variables related to physical health issues:
The second relates to behavioral health related issues, which were identified in the same way:
All models also include sex, race, and age because we are interested in the demographic effects despite their significance. The main differences between the depression and anxiety models are the presence of the opposite clinical scoring scales in each model.
Note: HPV was not included in the models because there were too few cases and did not let the model run appropriately.
The third model combines all predictors from the first two models to explore additional possible significant variables.
The anxiety refined model is composed of every predictor that was significant at the 0.1 level in the first three models. More information on the refined model can be found in the tabs “Refined Model Interpretation,” “Goodness of Fit & Adequacy,” “Predictive Performance,” and “ROC.”
We fit a binary logistic regression model predicting the presence of a GAD-7 score greater than or equal to 10 (the standard cutoff for clinical intervention) from the following variables: sex, race, age(year_1), general health, and any experience in the last 12 months of endometriosis, chronic fatigue syndrome, anxiety, chlamydia, high cholesterol, or diabetes, ear infection, or depression.
The model was stratified into a training set of 70% and a test set of 30%. Due to imbalances, the training set was upsampled to better train the model. Original counts GAD-7 score >= 10 were 255 “No” and 153 “Yes” in the training set. The seed was set at 2626 for consistent split data. Details on data splitting and up-sampling can be found in the source code.
The following predictors were significant in the refined model:
race(Asian): OR = 0.13 (0.13 - 1 = -0.87 ~ -87%)
race(Black): OR = 0.23 (0.23 - 1 = -0.77 ~ -77%)
race(Hispanic): OR = 0.13 (0.13 - 1 = -0.87)
PHQ-9 Score OR = 1.66 ~ 166%
general health (Poor) OR = 1.65 ~ 165%
endometriosis(12 months) OR = 0.03 (0.03 - 1 = -0.03 ~ -97%)
anxiety(12 months): OR = 5.33 ~ 533%
high cholesterol(12 months): OR = 4.36 ~ 436%%
ear infection(12 months): OR = 5.01 ~ 501%
diagnosed depression: OR = 0.30 (0.30 - 1 = -0.70 ~ -70%)
Using a 0.5 probability cutoff on the test data, the model achieved accuracy of 80.2%, with 76.3% sensitivity and 82.5% specificity. The ROC curve yielded an AUC of 0.906, showing excellent discrimination between students with high and low levels of anxiety.
Below is the first model including physical health measures and demographics.
Call:
glm(formula = gad7_anxiety01 ~ sex + race + year_1 + acha_12months_any_allergy +
acha_12months_any_asthma + acha_12months_any_back + acha_12months_any_blood +
acha_12months_any_bronchitis + acha_12months_any_cholesterol +
acha_12months_any_diabetes + acha_12months_any_ear + acha_12months_any_endometriosi +
acha_12months_any_fatigure + acha_12months_any_hepatitis +
acha_12months_any_pelvic + acha_12months_any_sinus + acha_12months_any_strep +
acha_12months_any_stressInjury + general_health, family = binomial,
data = anxietytrain_up)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -62.214062 53.125954 -1.171 0.24157
sexMale -0.595674 0.239712 -2.485 0.01296 *
raceasian -0.867714 0.290138 -2.991 0.00278 **
raceblack -1.889694 0.536677 -3.521 0.00043 ***
racehispanic -0.692421 0.349475 -1.981 0.04756 *
raceother -0.533554 0.425833 -1.253 0.21022
year_1 0.030642 0.026550 1.154 0.24845
acha_12months_any_allergyYes -0.002919 0.218950 -0.013 0.98936
acha_12months_any_asthmaYes 0.341598 0.319203 1.070 0.28455
acha_12months_any_backYes 0.295653 0.220250 1.342 0.17948
acha_12months_any_bloodYes 0.123547 0.569326 0.217 0.82820
acha_12months_any_bronchitisYes 0.764228 0.648626 1.178 0.23871
acha_12months_any_cholesterolYes 0.928115 0.522497 1.776 0.07568 .
acha_12months_any_diabetesYes -1.737579 0.900783 -1.929 0.05374 .
acha_12months_any_earYes 0.392541 0.490677 0.800 0.42371
acha_12months_any_endometriosiYes -0.852465 0.705852 -1.208 0.22716
acha_12months_any_fatigureYes 1.635044 0.553253 2.955 0.00312 **
acha_12months_any_hepatitisYes -12.524749 495.828360 -0.025 0.97985
acha_12months_any_pelvicYes 10.863235 495.829904 0.022 0.98252
acha_12months_any_sinusYes 0.360740 0.274268 1.315 0.18842
acha_12months_any_strepYes -0.134419 0.394340 -0.341 0.73320
acha_12months_any_stressInjuryYes -0.987693 0.634060 -1.558 0.11930
general_healthFair 1.890054 0.521526 3.624 0.00029 ***
general_healthGood 1.030590 0.469695 2.194 0.02822 *
general_healthPoor 4.721591 1.007361 4.687 2.77e-06 ***
general_healthVery Good 0.639205 0.475036 1.346 0.17843
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 707.01 on 509 degrees of freedom
Residual deviance: 586.13 on 484 degrees of freedom
AIC: 638.13
Number of Fisher Scoring iterations: 13
Sex, race, high cholesterol(12 months), diabetes(12 months), chronic fatigue syndrome(12 months), and general health were all significant and will be included in the refined model.
Below is the behavioral health model, also including demographics.
Call:
glm(formula = gad7_anxiety01 ~ sex + race + year_1 + general_health +
phq9_score + acha_depression + acha_12months_any_anorexia +
acha_12months_any_anxiety + acha_12months_any_bulimia + acha_12months_any_depression +
acha_12months_any_hpv + acha_12months_any_seasonal + acha_12months_any_substance +
acha_12months_any_chlamydia + acha_12months_any_mononucleosis +
acha_12months_any_gonorrhea + acha_12months_any_HIV, family = binomial,
data = anxietytrain_up)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -39.89726 62.18778 -0.642 0.521158
sexMale -0.09231 0.36047 -0.256 0.797892
raceasian -1.76308 0.48700 -3.620 0.000294 ***
raceblack -1.66146 0.66682 -2.492 0.012717 *
racehispanic -1.77985 0.58892 -3.022 0.002509 **
raceother -0.19498 0.54567 -0.357 0.720849
year_1 0.01775 0.03105 0.571 0.567676
general_healthFair 0.18433 0.82259 0.224 0.822693
general_healthGood -0.68067 0.76314 -0.892 0.372431
general_healthPoor 1.73502 1.18218 1.468 0.142201
general_healthVery Good -0.16702 0.75947 -0.220 0.825941
phq9_score 0.46340 0.04633 10.002 < 2e-16 ***
acha_depressionYes -0.83558 0.46710 -1.789 0.073639 .
acha_12months_any_anorexiaYes 0.48878 0.57263 0.854 0.393340
acha_12months_any_anxietyYes 1.64917 0.36711 4.492 7.05e-06 ***
acha_12months_any_bulimiaYes 0.25590 0.84086 0.304 0.760880
acha_12months_any_depressionYes -0.14809 0.44751 -0.331 0.740709
acha_12months_any_hpvYes 0.18825 2.26574 0.083 0.933785
acha_12months_any_seasonalYes -0.39450 0.41098 -0.960 0.337097
acha_12months_any_substanceYes -0.19917 0.60421 -0.330 0.741674
acha_12months_any_chlamydiaYes -3.48734 1.42120 -2.454 0.014135 *
acha_12months_any_mononucleosisYes 0.29286 2.31533 0.126 0.899345
acha_12months_any_gonorrheaYes -12.49422 903.85907 -0.014 0.988971
acha_12months_any_HIVYes 2.67045 1713.22709 0.002 0.998756
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 707.01 on 509 degrees of freedom
Residual deviance: 309.26 on 486 degrees of freedom
AIC: 357.26
Number of Fisher Scoring iterations: 14
Race, PHQ-9 score, depression(12 months), anxiety(12 months), and chlamydia(12 months) were all significant and will be included in the refined model.
Below is the third model that includes all predictors from the first and second models.
Call:
glm(formula = gad7_anxiety01 ~ sex + race + year_1 + general_health +
phq9_score + +acha_depression + acha_12months_any_anorexia +
acha_12months_any_anxiety + acha_12months_any_bulimia + acha_12months_any_depression +
acha_12months_any_hpv + acha_12months_any_seasonal + acha_12months_any_substance +
acha_12months_any_chlamydia + acha_12months_any_mononucleosis +
acha_12months_any_gonorrhea + acha_12months_any_HIV + acha_12months_any_allergy +
acha_12months_any_asthma + acha_12months_any_back + acha_12months_any_blood +
acha_12months_any_bronchitis + acha_12months_any_cholesterol +
acha_12months_any_diabetes + acha_12months_any_ear + acha_12months_any_endometriosi +
acha_12months_any_fatigure + acha_12months_any_hepatitis +
acha_12months_any_pelvic + acha_12months_any_sinus + acha_12months_any_strep +
acha_12months_any_stressInjury, family = binomial, data = anxietytrain_up)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.542e+01 6.830e+01 -0.226 0.821390
sexMale -6.232e-02 4.047e-01 -0.154 0.877599
raceasian -1.915e+00 5.428e-01 -3.527 0.000420 ***
raceblack -1.487e+00 7.283e-01 -2.041 0.041227 *
racehispanic -2.023e+00 6.522e-01 -3.102 0.001925 **
raceother -6.315e-01 6.272e-01 -1.007 0.313956
year_1 5.020e-03 3.413e-02 0.147 0.883048
general_healthFair 5.550e-01 9.104e-01 0.610 0.542137
general_healthGood -3.513e-01 8.546e-01 -0.411 0.681017
general_healthPoor 3.197e+00 1.543e+00 2.071 0.038337 *
general_healthVery Good 1.582e-01 8.391e-01 0.189 0.850457
phq9_score 5.122e-01 5.199e-02 9.851 < 2e-16 ***
acha_depressionYes -9.157e-01 5.214e-01 -1.756 0.079027 .
acha_12months_any_anorexiaYes 2.465e-01 6.587e-01 0.374 0.708290
acha_12months_any_anxietyYes 1.758e+00 4.184e-01 4.201 2.66e-05 ***
acha_12months_any_bulimiaYes 3.369e-01 9.466e-01 0.356 0.721909
acha_12months_any_depressionYes -4.262e-01 4.979e-01 -0.856 0.391911
acha_12months_any_hpvYes 1.182e+00 1.963e+00 0.602 0.547132
acha_12months_any_seasonalYes -2.636e-01 4.698e-01 -0.561 0.574769
acha_12months_any_substanceYes -1.685e-01 6.960e-01 -0.242 0.808724
acha_12months_any_chlamydiaYes -2.585e+00 1.833e+00 -1.410 0.158577
acha_12months_any_mononucleosisYes -7.644e-03 5.107e+00 -0.001 0.998806
acha_12months_any_gonorrheaYes -1.255e+01 1.530e+03 -0.008 0.993458
acha_12months_any_HIVYes -8.600e+00 3.106e+03 -0.003 0.997791
acha_12months_any_allergyYes 2.248e-02 3.554e-01 0.063 0.949568
acha_12months_any_asthmaYes -2.936e-02 4.820e-01 -0.061 0.951429
acha_12months_any_backYes 6.312e-02 3.626e-01 0.174 0.861816
acha_12months_any_bloodYes -1.545e-01 8.364e-01 -0.185 0.853460
acha_12months_any_bronchitisYes 4.291e-01 1.000e+00 0.429 0.667895
acha_12months_any_cholesterolYes 1.632e+00 7.346e-01 2.221 0.026331 *
acha_12months_any_diabetesYes -2.571e+00 1.450e+00 -1.774 0.076131 .
acha_12months_any_earYes 1.365e+00 7.142e-01 1.912 0.055934 .
acha_12months_any_endometriosiYes -3.757e+00 1.058e+00 -3.549 0.000386 ***
acha_12months_any_fatigureYes 8.497e-01 8.274e-01 1.027 0.304496
acha_12months_any_hepatitisYes -7.539e+00 1.244e+03 -0.006 0.995165
acha_12months_any_pelvicYes 1.964e+01 1.759e+03 0.011 0.991094
acha_12months_any_sinusYes 4.061e-01 4.251e-01 0.956 0.339324
acha_12months_any_strepYes -1.097e-01 6.110e-01 -0.180 0.857446
acha_12months_any_stressInjuryYes -1.026e+00 1.044e+00 -0.982 0.326031
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 707.01 on 509 degrees of freedom
Residual deviance: 280.31 on 471 degrees of freedom
AIC: 358.31
Number of Fisher Scoring iterations: 15
Race, general health, PHQ-9 score, depression(12 months), anxiety(12 months), high cholesterol(12 months), diabetes(12 months) ear infection(12 months), and endometriosis(12 months) were all significant and will be included in the refined model.
Call:
glm(formula = gad7_anxiety01 ~ sex + race + year_1 + phq9_score +
general_health + acha_12months_any_endometriosi + acha_12months_any_fatigure +
acha_12months_any_anxiety + acha_12months_any_chlamydia +
acha_12months_any_cholesterol + acha_12months_any_diabetes +
acha_12months_any_ear + acha_depression, family = binomial,
data = anxietytrain_up)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -29.15440 64.57294 -0.451 0.651632
sexMale -0.19742 0.38218 -0.517 0.605460
raceasian -1.99705 0.53123 -3.759 0.000170 ***
raceblack -1.46891 0.68980 -2.129 0.033216 *
racehispanic -2.03990 0.62463 -3.266 0.001092 **
raceother -0.63875 0.59002 -1.083 0.278984
year_1 0.01201 0.03225 0.373 0.709480
phq9_score 0.50625 0.05082 9.961 < 2e-16 ***
general_healthFair 0.45704 0.88610 0.516 0.606001
general_healthGood -0.44908 0.83070 -0.541 0.588779
general_healthPoor 2.80512 1.32934 2.110 0.034845 *
general_healthVery Good 0.07764 0.82354 0.094 0.924894
acha_12months_any_endometriosiYes -3.54242 0.95958 -3.692 0.000223 ***
acha_12months_any_fatigureYes 0.66537 0.72508 0.918 0.358804
acha_12months_any_anxietyYes 1.67397 0.38677 4.328 1.5e-05 ***
acha_12months_any_chlamydiaYes -2.66535 1.63096 -1.634 0.102213
acha_12months_any_cholesterolYes 1.47331 0.66752 2.207 0.027303 *
acha_12months_any_diabetesYes -2.15427 1.35566 -1.589 0.112039
acha_12months_any_earYes 1.61243 0.63597 2.535 0.011232 *
acha_depressionYes -1.20081 0.39562 -3.035 0.002403 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 707.01 on 509 degrees of freedom
Residual deviance: 284.18 on 490 degrees of freedom
AIC: 324.18
Number of Fisher Scoring iterations: 6
Below are ORs and confidence intervals for all predictors. Specific interpretations can be found in the “Refined Model Interpretation” tab.
OR 2.5 % 97.5 %
(Intercept) 2.179732e-13 9.966249e-71 1.327003e+42
sexMale 8.208467e-01 3.866025e-01 1.740074e+00
raceasian 1.357348e-01 4.614130e-02 3.730922e-01
raceblack 2.301758e-01 5.301792e-02 8.273553e-01
racehispanic 1.300419e-01 3.701817e-02 4.322528e-01
raceother 5.279495e-01 1.591848e-01 1.643365e+00
year_1 1.012086e+00 9.502753e-01 1.081041e+00
phq9_score 1.659055e+00 1.511300e+00 1.845752e+00
general_healthFair 1.579397e+00 2.803610e-01 9.132929e+00
general_healthGood 6.382130e-01 1.256300e-01 3.281206e+00
general_healthPoor 1.652899e+01 1.380411e+00 2.701381e+02
general_healthVery Good 1.080729e+00 2.177320e-01 5.530556e+00
acha_12months_any_endometriosiYes 2.894314e-02 4.655735e-03 2.056917e-01
acha_12months_any_fatigureYes 1.945209e+00 5.072593e-01 9.108358e+00
acha_12months_any_anxietyYes 5.333286e+00 2.536835e+00 1.162889e+01
acha_12months_any_chlamydiaYes 6.957523e-02 3.150366e-03 1.123522e+00
acha_12months_any_cholesterolYes 4.363667e+00 1.210445e+00 1.678270e+01
acha_12months_any_diabetesYes 1.159881e-01 7.506348e-03 1.661622e+00
acha_12months_any_earYes 5.014968e+00 1.498793e+00 1.848290e+01
acha_depressionYes 3.009489e-01 1.355457e-01 6.425867e-01
Analysis of Deviance Table
Model 1: gad7_anxiety01 ~ 1
Model 2: gad7_anxiety01 ~ sex + race + year_1 + phq9_score + general_health +
acha_12months_any_endometriosi + acha_12months_any_fatigure +
acha_12months_any_anxiety + acha_12months_any_chlamydia +
acha_12months_any_cholesterol + acha_12months_any_diabetes +
acha_12months_any_ear + acha_depression
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 509 707.01
2 490 284.18 19 422.83 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Our refined model depicts the probability of anxiety using the chosen predictors better than using the null-model. With a p-value of < 2.2 x 10^-16, we can conclude the that our model with the included variables provides substantial explanatory power for predicting clinical levels of anxiety.
Pseudo-R^2fitting null model for pseudo-r2
llh llhNull G2 McFadden r2ML r2CU
-142.0909433 -353.5050621 422.8282376 0.5980512 0.5635472 0.7513962
The McFadden value is greater than 0.4 at 0.59, indicating an excellent fit compared to the null-model. The Cox & Snell value is 0.56, and the Nagelkerke value is 0.75, so the model achieves 75% of the maximum possible improvement in fit compared to the null. All three of these values indicate good fit of the model.
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 52 9
1 11 29
Accuracy : 0.802
95% CI : (0.7109, 0.8746)
No Information Rate : 0.6238
P-Value [Acc > NIR] : 8.813e-05
Kappa : 0.5825
Mcnemar's Test P-Value : 0.8231
Sensitivity : 0.7632
Specificity : 0.8254
Pos Pred Value : 0.7250
Neg Pred Value : 0.8525
Prevalence : 0.3762
Detection Rate : 0.2871
Detection Prevalence : 0.3960
Balanced Accuracy : 0.7943
'Positive' Class : 1
Using a 0.5 probability cutoff on the test data, the model achieved accuracy of 80.2%, with 76.3% sensitivity and 82.5% specificity. The model successfully predicts the outcome of sever anxiety.
Note: All results should be understood as only significant when keeping the chosen predictors constant. Different models may suggest different results.
Q1: Which health issues are related to depression and anxiety diagnoses?
Using a binary logistic regression compiled of predictors found to be significant in preliminary models, we did not find any physical health issues to significantly contribute to depression in college students. There are previous studies that refute this claim, and preliminary models suggested there were some correlations between these factors. Future models may want to exclude highly correlated variables such as alternative anxiety and depression measures from the equation.
For the anxiety model, we found that those with poor health have significantly more anxiety compared to those with excellent health when holding all of these predictors constant. Additionally, those who reported experiencing high cholesterol, an ear infection, or any anxiety in the past 12 months are significantly more likely to have severe anxiety levels. Surprisingly, people with a high PHQ-9 score had a higher probablity while those officially diagnosed with depression were slightly less likely. These two variables measure similar items, but it would be interesting to further explore why this discrepancy exists. Finally, those who have experienced endometrioses are also less likely to report high anxiety measures.
This data is interesting to observe. Future research should explore what facotrs contribute to high cholesterol and what potential environmental factors cause ear infections. It would be interesting to see if there are any underlying connections between these health issues and generalized anxiety disorder.
Q2: Are mental and physical health problems consistent across sex, race, and age?
According to the model those who only identify as, asian, black, hispanic, and people who identify as other races are less likely to be diagnosed with depression than people who only identify as white. This is an interesting finding, but also worth noting that a majority of the sample was white and this may affect the outcome. Additionally, as age increases by year, people are more likely to be diagnosed with depression. Because the age variable is coded by the year they were born, it is more difficult to interpret off hand. There was no significant difference between sexes.
Again, asian, black, and hispanic people were less likely to report high levels of anxiety than white people. Age and sex were not significant in the refined model. However, it is worth noting that males appeared less likely to report severe anxiety in general, and this was significant in the first model.
Future Directions
Because all health data was based on binary variables, future studies may want to explore more continuous measures of health. This could offer more insight into the correlation of mental health and physical health problems if there is more information that just yes or no.
Braghieri, L., Levy, R., & Makarin, A. (2021). Social media and mental health. SSRN Electronic Journal, 112(11). https://doi.org/10.2139/ssrn.3919760
Kroenke, K., Spitzer, R. L., & Williams, J. B. W. (2001). The PHQ-9: Validity of a brief depression severity measure. Journal of General Internal Medicine, 16(9), 606–613. https://doi.org/10.1046/j.1525-1497.2001.016009606.x
Spitzer, R. L., Kroenke, K., Williams, J. B. W., & Löwe, B. (2006). A brief measure for assessing generalized anxiety disorder: the GAD-7. Archives of Internal Medicine, 166(10), 1092–1097. https://doi.org/10.1001/archinte.166.10.1092
Spring 2008 reference group executive summary. (n.d.). Retrieved September 20, 2025, from https://www.acha.org/wp-content/uploads/2024/07/ACHA-NCHA_Reference_Group_ExecutiveSummary_Spring2008.pdf
Note: ChatGPT was utilized to help create up-sampling code and identify output errors.
---
title: "Collegiate Mental Health "
author: "Audrey DeGregorio"
output:
flexdashboard::flex_dashboard:
theme:
version: 4
bootswatch: bootstrap
navbar-bg: "#B3CAD8"
orientation: columns
source_code: embed
---
```{r setup, include=FALSE}
pacman::p_load(detectseparation,flexdashboard, car, caret, Benchmarking, tidyverse, ggplot2, pscl, tibble,pROC)
data <- read_csv("G:/My Drive/Fall 2025/MTH 369/Regression RStudio/Final Project/MentalHealthSurvey.csv")
data <- data |>
dplyr::select(c(RecordedDate, year_1, state_1, surveys, general_health, starts_with("phq9"), starts_with("gad7"), starts_with("acha_12months"), starts_with("acha_services"), acha_depression, sex, fulltime, international, starts_with("race")))
attach(data)
#creating race variable
data$race_white1 <- ifelse(data$race_white == "White - not Hispanic (includes Middle Eastern)", 1, 0)
data$race_asian1 <- ifelse(data$race_asian == "Asian or Pacific Islander", 1, 0)
data$race_black1 <- ifelse(data$race_black == "Black - not Hispanic", 1, 0)
data$race_hispanic1 <- ifelse(data$race_hispanic == "Hispanic or Latino", 1, 0)
data$race_native1 <- ifelse(data$race_native == "American Indian or Alaskan Native", 1, 0)
data$race_other1 <- ifelse(data$race_other == "Other", 1, 0)
race_count <- apply(data[,71:76], 1, function(x) sum(x, na.rm=T))
data <- data %>%
mutate(race_count_other = case_when(race_count > 1 ~ TRUE))
data <- data %>%
mutate(race = case_when(race_count > 1 ~ "other",
race_white1 == "1" ~ "white",
race_black1 == "1" ~ "black",
race_hispanic1 == "1" ~ "hispanic",
race_native1 == "1" ~ "native",
race_asian1 == "1" ~ "asian",
race_other1 == "1" ~ "other"))
data$race <- factor(data$race)
data$race <- relevel(data$race, ref = "white")
glimpse(data)
data <- data |>
dplyr::select(c(RecordedDate, year_1, state_1, surveys, general_health, starts_with("phq9"), starts_with("gad7"), starts_with("acha_12months"), starts_with("acha_services"), acha_depression, sex, fulltime, international, race))
# phq9
data <- data %>%
mutate(phq9_interest1 = case_when(
phq9_interest == "Not at all" ~ 0,
phq9_interest == "Several days" ~ 1,
phq9_interest == "More than half of the days" ~ 2,
phq9_interest == "Nearly every day" ~ 3),
phq9_depressed1 = case_when(
phq9_depressed == "Not at all" ~ 0,
phq9_depressed == "Several days" ~ 1,
phq9_depressed == "More than half of the days" ~ 2,
phq9_depressed == "Nearly every day" ~ 3),
phq9_sleep1 = case_when(
phq9_sleep == "Not at all" ~ 0,
phq9_sleep == "Several days" ~ 1,
phq9_sleep == "More than half of the days" ~ 2,
phq9_sleep == "Nearly every day" ~ 3),
phq9_tired1 = case_when(
phq9_tired == "Not at all" ~ 0,
phq9_tired == "Several days" ~ 1,
phq9_tired == "More than half of the days" ~ 2,
phq9_tired == "Nearly every day" ~ 3),
phq9_appetite1 = case_when(
phq9_appetite == "Not at all" ~ 0,
phq9_appetite == "Several days" ~ 1,
phq9_appetite == "More than half of the days" ~ 2,
phq9_appetite == "Nearly every day" ~ 3),
phq9_failure1 = case_when(
phq9_failure == "Not at all" ~ 0,
phq9_failure == "Several days" ~ 1,
phq9_failure == "More than half of the days" ~ 2,
phq9_failure == "Nearly every day" ~ 3),
phq9_concentrating1 = case_when(
phq9_concentrating == "Not at all" ~ 0,
phq9_concentrating == "Several days" ~ 1,
phq9_concentrating == "More than half of the days" ~ 2,
phq9_concentrating == "Nearly every day" ~ 3),
phq9_speed1 = case_when(
phq9_speed == "Not at all" ~ 0,
phq9_speed == "Several days" ~ 1,
phq9_speed == "More than half of the days" ~ 2,
phq9_speed == "Nearly every day" ~ 3),
phq9_selfharm1 = case_when(
phq9_selfharm == "Not at all" ~ 0,
phq9_selfharm == "Several days" ~ 1,
phq9_selfharm == "More than half of the days" ~ 2,
phq9_selfharm == "Nearly every day" ~ 3),
phq9_score = (phq9_interest1 + phq9_depressed1 + phq9_sleep1 +
phq9_tired1 + phq9_appetite1 + phq9_failure1 +
phq9_concentrating1 + phq9_speed1 + phq9_selfharm1),
phq9_severity = case_when(
phq9_score <= 4 ~ "None-minimal",
phq9_score > 4 & phq9_score <= 9 ~ "Mild",
phq9_score >= 10 & phq9_score <= 14 ~ "Moderate",
phq9_score >= 15 & phq9_score <= 19 ~ "Moderately Severe",
phq9_score >= 20 ~ "Severe")
)
# gad7
data <- data %>%
mutate(gad7_anxious1 = case_when(
gad7_anxious == "Not at all" ~ 0,
gad7_anxious == "Several days" ~ 1,
gad7_anxious == "More than half of the days" ~ 2,
gad7_anxious == "Nearly every day" ~ 3),
gad7_control1 = case_when(
gad7_control == "Not at all" ~ 0,
gad7_control == "Several days" ~ 1,
gad7_control == "More than half of the days" ~ 2,
gad7_control == "Nearly every day" ~ 3),
gad7_worrying1 = case_when(
gad7_worrying == "Not at all" ~ 0,
gad7_worrying == "Several days" ~ 1,
gad7_worrying == "More than half of the days" ~ 2,
gad7_worrying == "Nearly every day" ~ 3),
gad7_relaxing1 = case_when(
gad7_relaxing == "Not at all" ~ 0,
gad7_relaxing == "Several days" ~ 1,
gad7_relaxing == "More than half of the days" ~ 2,
gad7_relaxing == "Nearly every day" ~ 3),
gad7_restless1 = case_when(
gad7_restless == "Not at all" ~ 0,
gad7_restless == "Several days" ~ 1,
gad7_restless == "More than half of the days" ~ 2,
gad7_restless == "Nearly every day" ~ 3),
gad7_annoyed1 = case_when(
gad7_annoyed == "Not at all" ~ 0,
gad7_annoyed == "Several days" ~ 1,
gad7_annoyed == "More than half of the days" ~ 2,
gad7_annoyed == "Nearly every day" ~ 3),
gad7_afraid1 = case_when(
gad7_afraid == "Not at all" ~ 0,
gad7_afraid == "Several days" ~ 1,
gad7_afraid == "More than half of the days" ~ 2,
gad7_afraid == "Nearly every day" ~ 3),
gad7_score = (gad7_anxious1 + gad7_control1 + gad7_worrying1 +
gad7_relaxing1 + gad7_restless1 + gad7_annoyed1 +
gad7_afraid1),
gad7_severity = case_when(
gad7_score <= 4 ~ "Minimal Anxiety",
gad7_score > 4 & gad7_score <= 9 ~ "Mild Anxiety",
gad7_score >= 10 & gad7_score <= 14 ~ "Moderate Anxiety",
gad7_score >= 15 ~ "Severe Anxiety"))
data <- data %>%
mutate(gad7_anxiety =
case_when(gad7_severity == "Mild Anxiety" |gad7_severity == "Minimal Anxiety" ~ "No",
gad7_severity == "Moderate Anxiety" | gad7_severity =="Severe Anxiety" ~ "Yes"))
# ACHA depression diagnosis cleaning
data <- data %>%
mutate(acha_services_diagnosed1 = case_when(
acha_depression == "No" ~ NA,
TRUE ~ acha_services_dianosed),
acha_services_therapy1 = case_when(
acha_depression == "No" ~ NA,
TRUE ~ acha_services_therapy),
acha_services_medication1 = case_when(
acha_depression == "No" ~ NA,
TRUE ~ acha_services_medication))
attach(data)
```
Introduction
===
Column { data-width=500}
-----------------------------------------------------------------------
### <font size=4><span Style = "color:#2C7BB6"> Background</span></font>
The purpose of this project is to explore relationships among mental and physical health issues, and predictors of anxiety and depression among college students. While mental health is sometimes hard to measure, there are standardized methods developed to help psychologists evaluate the presence and severity of certain mental health indicators. In this project, we will focus mainly on depression and anxiety disorders in relation to overall mental health.
Logistic regression models will be used to analyze the probability of the presence of a depression or anxiety diagnosis. A variable from the NCHA survey asks particpants if they have been diagnosed with depression (acha_depression). There is no standard value for anxiety diagnosis in the survey, so GAD-7 cutoff scores of <= 10 are coded as having anxiety, and lower are coded as not. Ten is the standard score for the GAD-7 where clinical intervention is required.
Our physical and mental health are not distinct from each other. It is important to view health from an overall perspective, because all areas of our lives cross over into each other. College students are a vulnerable population for mental health issues. When we understand which factors are related to their prevalence, we can work to create better prevention strategies and know who to focus our attention on. Depression and anxiety are not purely mental, but present themselves through physical problems like fatigue, high heart rate, inflammation, and other biological symptoms.
### <font size=4><span Style = "color:#2C7BB6"> Research Questions</span></font>
- Which health issues are related to depression and anxiety diagnoses?
- Are mental and physical health problems consistent across sex, race, and age?
Column {.tabset data-width=500}
-----------------------------------------------------------------------
<font size=4><span Style = "color:#2C7BB6"> Data Description</span></font>
### Source
The data set comes from a study based on social media's affects on college students' mental health (Braghieri, et. al 2021). The data includes variables with information from PHQ-9 Depression Screening Surveys and GAD-7 Anxiety Screening Surveys. Both are highly reliable and valid in medical diagnoses. Additionally, there are survey questions derived from the American College Health Association (ACHA)'s National College Health Assessment (NCHA). This data set was originally used to evaluate the validity of the NCHA by comparing the outcomes to the PHQ-9 and the GAD-7, both of which were highly correlated to the poor mental health predictors of the NCHA. There are 509 observations.
```{r}
glimpse(data)
```
### PHQ-9
The [PHQ-9](https://doi.org/10.1046/j.1525-1497.2001.016009606.x) is a standardized survey used to screen and diagnose depression. Participants are asked how often they have been bothered by nine specific problems over the past two weeks and respond with one of the four answers:
1. Not at all (+0)
2. Several days(+1)
3. More than half the days (+2)
4. Nearly everyday (+3)
The nine prompts consist of the following:
1. Little interest or pleasure in doing things
2. Feeling down, depressed or hopeless
3. Trouble falling asleep, staying asleep, or sleeping too much
4. Feeling tired or having little energy
5. Poor appetite or overeating
6. Feeling bad about yourself - or that you’re a failure or have let yourself or your family down
7. Trouble concentrating on things, such as reading the newspaper or watching television
8. Moving or speaking so slowly that other people could have noticed. Or, the opposite - being so fidgety or restless that you have been moving around a lot more than usual
9. Thoughts that you would be better off dead or of hurting yourself in some way
Surveys are then scored and indicate levels of depression based on this scale:
- **0-4** None-minimal
- **5-9** Mild
- **10-14** Moderate
- **15-19** Moderately Severe
- **20-27** Severe
### GAD-7
Similar to the PHQ-9, the [GAD-7](https://doi.org/10.1001/archinte.166.10.1092) another is standardized survey, but it is used to screen generalized anxiety disorder. Participants are asked how often they have been bothered by seven specific problems over the past two weeks and respond with one of the four answers:
1. Not at all (+0)
2. Several days(+1)
3. More than half the days (+2)
4. Nearly everyday (+3+)
The nine prompts consist of the following:
1. Feeling nervous, anxious or on edge
2. Not being able to stop or control worrying
3. Worrying too much about different things
4. Trouble relaxing
5. Being so restless that it is hard to sit still
6. Becoming easily annoyed or irritable
7. Feeling afraid as if something awful might happen
Surveys are then scored and indicate levels of anxiety based on this scale:
- **0-4** Minimal anxiety
- **5-9** Mild anxiety
- **10-14** Moderate anxiety
- **15+** Severe anxiety
### NCHA
The [National College Health Assessment](https://www.acha.org/wp-content/uploads/2024/07/ACHA-NCHA_Reference_Group_ExecutiveSummary_Spring2008.pdf) is a semi-annual survey administered to college students by the ACHA. The current data utilizes the survey questions, but not the data collected from the ACHA.
The data includes the following prompts from the survey:
**Depression Symptoms:**
- Feeling things were hopeless
- Feeling overwhelmed by all they had to do
- Feeling very sad
- Feeling so depressed it was difficult to
function
- Seriously considering attempting suicide
- Attempting suicide
*Possible Responses:*
- Never
- 1-2 times
- 3-4 times
- 5-6 times
- 7-8 times
- 9-10 times
- 11 or more times
**General Health Indicators:**
Reported any of the following in the past 12 months:
- Allergy problems
- Anorexia
- Anxiety disorder
- Asthma
- Bulimia
- Chronic fatigue syndrome
- Depression
- Diabetes
- Endometriosis
- Genital herpes
- Genital warts/HPV
- Hepatitis B or C
- High blood pressure
- High cholesterol
- HIV infection
- Repetitive stress injury
- Seasonal affective disorder
- Substance abuse problem
- Back pain
- Broken bone/fracture
- Bronchitis
- Chlamydia
- Ear infection
- Gonorrhea
- Mononucleosis
- Pelvic inflammatory disease
- Sinus infection
- Strep throat
- Tuberculosis
*Possible Responses:*
- Yes/No
- NA
Diagnosed with depression? (Yes/No)
If yes,
- Diagnosed with depression in the last school year
- Currently in therapy for depression
- Currently taking medication for depression
*Possible Responses:*
- Yes/No
- NA
### Data Cleaning
Many variables that were not useful for the research questions and were removed from the data set. Most of these include timestamps from clicks in the survey and other online browser information.
Other variables were created and transformed for ease of use. For example, race was condensed into one category with information for all races as opposed to 6 different variables with two levels.
Additionally, variables were created to represent the overall depression and anxiety scores from the PHQ-9 and GAD-7 screening results.
EDA
=================
Column { .tabset data-width=400}
---------------------------------------
<font size=4><span Style = "color:#2C7BB6">Discussion</span></font>
### PHQ-9 Results
**Figure 1**
Figure 1 shows the distribution of depression severity based on the PHQ-9. Most students fall under the Mild category, with less having more severe scores.
### ACHA Depression Diagnoses
**Figure 2**
Figure 2 reports the number of students professionally diagnosed with depression. These values will be used for the logistic regression analysis. Most students are not diagnosed with depression.
### GAD-7 Results
**Figure 3**
Figure 3 shows the distribution of anxiety severity based on the GAD-7. The results are similar to the distribution of depression, with most students exhibiting symptoms equivalent to mild anxiety and less facing severe symptoms.
### GAD-7 Binary
**Figure 4**
In order to perform a binary logistic regression, we divided the anxiety severity measures into "Yes" and "No", similar to the depression diagnoses. The general cutoff for clinical intervention effort are scores of >= 10 (Moderate and Severe Anxiety). A majority of students do not fall under the more severe anxiety measure.
### Sex
**Figure 5**
Figure 5 shows how many male and female students are included in the dataset. There are approximately 350 females and 150 males.
### Race
**Figure 6**
Most students identify as white, with almost 300 students reporting so. All students who idenfied as more than one race are classified as "other".
### Age
**Figure 7**
Most students were born in the year 2000, with a long tail representing multiple older students also represented in the sample. **Note:** This data was collected in January 2022, so most students born in 2000 are around 21 years old.
### Health
**Figure 8**
This graph shows the amount of people who responded "yes" to experiencing each of the symptoms/disorders listed in the past 12 months. Anxiety, depression, allergies, and back pain are among the most common.
It is worth noting these measures of depression and anxiety are different from the PHQ-9 and GAD-7 scales, and were an independent, self-reported question in the NCHA questionnaire.
Column {.tabset data-width=600}
-----------------------------------
<font size=4><span Style = "color:#2C7BB6">Corresponding Graphs</span></font>
### Fig. 1
```{r }
data$phq9_severity <- factor(data$phq9_severity,
levels = c("None-minimal", "Mild", "Moderate", "Moderately Severe", "Severe"))
ggplot(data,aes(phq9_severity)) +
geom_bar(fill = "#C7A9A1") +
labs(title = "PHQ-9 Results", x = "Severity")
```
### Fig. 2
```{r fig. diag}
ggplot(data,aes(acha_depression)) +
geom_bar(fill = "#C7A9A1") +
labs(title = "Depression Diagnosis", x = "Diagnosed with Depression?")
```
### Fig. 3
```{r fig.3 gad7 bar}
data$gad7_severity <- factor(data$gad7_severity,
levels = c("Minimal Anxiety", "Mild Anxiety", "Moderate Anxiety", "Severe Anxiety"))
ggplot(data,aes(gad7_severity)) +
geom_bar(fill = "#C7A9A1") +
labs(title = "GAD-7 Results", x = "Severity")
```
### Fig. 4
```{r}
ggplot(data,aes(gad7_anxiety)) +
geom_bar(fill = "#C7A9A1") +
labs(title = "GAD-7 Scores >= 10", x = "Score Classification")
```
### Fig. 5
```{r sex}
ggplot(data, aes(sex)) +
geom_bar(fill="#C7A9A1") +
labs(title = "Sex Distribution", x = "Sex")
```
### Fig. 6
```{r race}
ggplot(data, aes(race)) +
geom_bar(fill="#C7A9A1") +
labs(title = "Race Distribution", x = "Race")
```
### Fig. 7
```{r age}
ggplot(data, aes(year_1)) +
geom_bar(fill="#C7A9A1") +
labs(title = "Age Distribution", x = "Birth Year")
```
### Fig. 8
```{r}
acha <- data %>%
pivot_longer(
cols = c(acha_12months_any_allergy:acha_12months_any_tuberculosis), #
names_to = "variable",
values_to = "response") %>%
filter(response == "Yes")
ggplot(acha, aes(x = variable, fill = response)) +
geom_bar(position = "dodge") +
scale_fill_manual(values = c("Yes" = "#C7A9A1")) +
scale_x_discrete(labels = c(
acha_12months_any_allergy = "Allergy",
acha_12months_any_anorexia = "Anorexia",
acha_12months_any_anxiety = "Anxiety",
acha_12months_any_asthma = "Asthma",
acha_12months_any_back = "Back Pain",
acha_12months_any_blood = "High BP",
acha_12months_any_bronchitis = "Bronchitis",
acha_12months_any_bulimia = "Bulimia",
acha_12months_any_cholesterol = "High Cholesterol",
acha_12months_any_depression = "Depression",
acha_12months_any_diabetes = "Diabetes",
acha_12months_any_ear = "Ear Infection",
acha_12months_any_endometriosi = "Endometriosis",
acha_12months_any_fatigure = "Chronic Fatigue",
acha_12months_any_fracture = "Broken Bone",
acha_12months_any_gonorrhea = "Gonorrhea",
acha_12months_any_hepatitis = "Hepatitis",
acha_12months_any_herpes = "Genital Herpes",
acha_12months_any_HIV = "HIV",
acha_12months_any_hpv = "HPV",
acha_12months_any_mononucleosis = "Mononucleosis",
acha_12months_any_pelvic = "Pelvic Inflammatory Disease",
acha_12months_any_seasonal = "Seasonal Affective Disorder",
acha_12months_any_sinus = "Sinus Infection",
acha_12months_any_strep = "Strep Throat",
acha_12months_any_substance = "Substance Abuse Disorder",
acha_12months_any_tuberculosis = "Tuberculosis",
acha_12months_any_chlamydia = "Chlamydia",
acha_12months_any_stressInjury = "Repetitive Stress Injury")) +
labs(title = "NCHA Questionnaire Results",x = "Report Experiencing in the Past 12 Months", y = "Count") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "none")
```
Depression Models
===
Column {.tabset data-width=500}
---
### Depression Overview
There are many health-related variables within the data set. To identify the best model, we created three separate preliminary models to better understand each predictor's significance. The first model uses variables related to *physical health* issues:
- Allergy problems
- Asthma
- Chronic fatigue syndrome
- Diabetes
- Endometriosis
- Hepatitis B or C
- High blood pressure
- High cholesterol
- Repetitive stress injury
- Back pain
- Broken bone/fracture
- Bronchitis
- Ear infection
- Pelvic inflammatory disease
- Sinus infection
- Strep throat
The second relates to *behavioral health* related issues. These were determined by mental health related issues (anxiety, bulimia, seasonal affective disorder, etc.) and sexually transmitted diseases (genital herpes, chlamydia, gonorrhea). The symptoms were divided in this way to make initial models that were similar in predictor numbers and get a general perspective of their significance.
- Anorexia
- Anxiety disorder (yes/no in the past 12 months)
- Bulimia
- Genital herpes
- Genital warts/HPV
- HIV infection
- Seasonal affective disorder
- Substance abuse problem
- Chlamydia
- Gonorrhea
- Mononucleosis
- GAD-7 Score (continuous measure of anxiety severity)
All models also include sex, race, and age because we are interested in the demographic effects despite their significance.
**Note:** Tuberculosis was not included in the models because there were too few cases and did not let the model run appropriately.
The third model combines all predictors from the first two models to explore additional possible significant variables.
Finally, our refined model is composed of every predictor that was significant at the 0.1 level in the first three models. More information on the refined model can be found in the tabs "Refined Model Interpretation," "Goodness of Fit & Adequacy," "Predictive Performance," and "ROC."
### Refined Model Interpretation
We fit a binary logistic regression model predicting the presence of a depression diagnosis (acha_depression_01) from the following variables: sex, race, age(year_1), general health, GAD-7 score, and any experience in the last 12 months of asthma, diabetes, repetitive stress injury, anxiety, depression, ear infection, or stress fracture.
The model was stratified into a training set of 70% and a test set of 30%. Due to imbalances, the training set was upsampled to better train the model. Original counts for depression diagnoses were 272 "No" and 136 "Yes" in the training set. The seed was set at 2626 for consistent split data. Details on data splitting and up-sampling can be found in the source code.
The following predictors were significant in the refined model:
**race(Asian):** OR = 0.21 (0.21 - 1 = -0.79 ~ -79%)
**race(Black):** OR = 0.16 (0.16 - 1 = -0.84 ~ -84%)
**race(Hispanic):** OR = 0.39 (0.39 - 1 = -0.61 ~ -61%)
**race(other):** OR = 0.32 (0.32 - 1 = -0.68 ~ -68%)
**age (year_1):** OR = 0.90 (0.90 - 1 = -0.10 ~ -10%)
**GAD-7 Score:** OR = 0.92 (0.92 - 1 = -0.08 ~ -8%)
**anxiety(12 months):** OR = 5.01 ~ 501%
**depression(12 months):** OR = 30.64 ~ 3064%
Using a 0.5 probability cutoff on the test data, the model achieved accuracy of 89.1%, with 88.2% sensitivity and 89.6% specificity. The ROC curve yielded an AUC of 0.931, showing excellent discrimination between students diagnosed and not diagnosed with depression.
Column {.tabset data-width=500}
---
### Physical Health
```{r, include=FALSE}
data <- data %>%
mutate(acha_depression_01 =
case_when(acha_depression == "Yes" ~ 1,
acha_depression == "No" ~ 0))
data$acha_depression_01 <- as.factor(data$acha_depression_01)
set.seed(2626)
train_index <- createDataPartition(data$acha_depression_01, p = 0.8, list = FALSE)
train <- data[train_index,]
test <- data[-train_index,]
table(train$acha_depression_01)
table(test$acha_depression_01)
# up sampling
depressiontrain_up <- upSample(x = train[, -which(names(train) == "acha_depression_01")],
y = train$acha_depression_01)
table(depressiontrain_up$Class)
names(depressiontrain_up)[ncol(depressiontrain_up)] <- "acha_depression_01"
table(depressiontrain_up$acha_depression_01)
table(test$acha_depression_01)
```
Below is the first model including physical health measures and demographics.
```{r }
depression_model1 <- glm(acha_depression_01 ~ sex + race + year_1 + general_health + acha_12months_any_allergy + acha_12months_any_asthma + acha_12months_any_fatigure +acha_12months_any_diabetes+ acha_12months_any_endometriosi +acha_12months_any_hepatitis+ acha_12months_any_blood+ acha_12months_any_cholesterol+acha_12months_any_stressInjury + acha_12months_any_back + acha_12months_any_fracture +acha_12months_any_bronchitis + acha_12months_any_ear +acha_12months_any_pelvic +acha_12months_any_sinus +acha_12months_any_strep, data = depressiontrain_up, family = binomial)
summary(depression_model1)
```
Race, age, general health, asthma(12 months), diabetes(12 months), stress injury(12 months), and fracture(12 months) were significant and will be included in the refined model.
### Behavioral Health
Below is the second model including measures of behavioral health.
```{r}
depression_model2 <- glm(acha_depression_01 ~ sex + race + year_1 + general_health + gad7_score + acha_12months_any_anorexia + acha_12months_any_anxiety + acha_12months_any_bulimia +acha_12months_any_depression + acha_12months_any_herpes +acha_12months_any_hpv +acha_12months_any_seasonal +acha_12months_any_substance +acha_12months_any_chlamydia+ acha_12months_any_gonorrhea +acha_12months_any_mononucleosis, data = depressiontrain_up, family = binomial)
summary(depression_model2)
```
Race, age, asthma(12 months), GAD-7 score, anxiety(12 months), and depression(12 months) were significant and will be included in the refined model.
### Combined
Below is the third model that includes all predictors from the first and second models.
```{r}
depression_model3 <- glm(acha_depression_01 ~ sex + race + year_1 + general_health + gad7_score + acha_12months_any_allergy + acha_12months_any_asthma + acha_12months_any_fatigure +acha_12months_any_diabetes+ acha_12months_any_endometriosi +acha_12months_any_hepatitis+ acha_12months_any_blood+ acha_12months_any_cholesterol+acha_12months_any_stressInjury + acha_12months_any_back + acha_12months_any_fracture +acha_12months_any_bronchitis + acha_12months_any_ear +acha_12months_any_pelvic +acha_12months_any_sinus +acha_12months_any_strep + gad7_score + acha_12months_any_anorexia + acha_12months_any_anxiety + acha_12months_any_bulimia +acha_12months_any_depression + acha_12months_any_herpes +acha_12months_any_hpv +acha_12months_any_seasonal +acha_12months_any_substance +acha_12months_any_chlamydia+ acha_12months_any_gonorrhea +acha_12months_any_mononucleosis, data = depressiontrain_up, family = binomial)
summary(depression_model3)
```
Race, age, GAD-7 score, ear infection(12 months), anxiety(12 months), and depression(12 months) were significant and will be included in the refined model.
### Refined
```{r}
depression_refined <- glm(acha_depression_01 ~ sex + race + year_1 + general_health + gad7_score+ acha_12months_any_asthma +acha_12months_any_diabetes + acha_12months_any_stressInjury +acha_12months_any_anxiety +acha_12months_any_depression +acha_12months_any_ear + acha_12months_any_fracture, data = depressiontrain_up, family = binomial)
summary(depression_refined)
```
Below are ORs and confidence intervals for all predictors. Specific interpretations can be found in the "Refined Model Interpretation" tab.
```{r}
exp(cbind(OR = coef(depression_refined),
confint(depression_refined)))
```
### Goodness of Fit & Adequacy
**Likeliehood Ratio Test**
```{r}
null_model <- glm(acha_depression_01 ~ 1, data = depressiontrain_up, family = binomial)
anova(null_model, depression_refined, test = "Chisq")
```
Our refined model depicts the probability of depression using the chosen predictors better than using the null-model. With a p-value of < 2.2 x 10^-16, we can conclude the that our model with the included variables provides substantial explanatory power for predicting depression diagnoses.
**Pseudo-R^2**
```{r}
pR2(depression_refined)
```
The McFadden value is greater than 0.4 at 0.51, indicating an excellent fit compared to the null-model. The Cox & Snell value is 0.51, and the Nagelkerke value is 0.68, so the model achieves 68% of the maximum possible improvement in fit compared to the null. All three of these values indicate good fit of the model.
### Predictive Performance
```{r}
depression_test_prob <- predict(depression_refined, newdata = test, type ="response")
depression_test_pred <- ifelse(depression_test_prob >= 0.5, "1", "0") %>%
factor(levels = levels(test$acha_depression_01))
cm_dep <- confusionMatrix(depression_test_pred, test$acha_depression_01, positive = "1")
cm_dep
```
Using a 0.5 probability cutoff on the test data, the model achieved accuracy of 89.1%, with 88.2% sensitivity and 89.6% specificity. The model successfully predicts the outcome of depression diagnoses.
### ROC
```{r}
droc_obj <- roc(response = test$acha_depression_01, predictor = depression_test_prob, levels = c("0","1"), direction = ("<"))
plot(droc_obj, print.auc = TRUE, legacy.axes = TRUE, main = "ROC Curve for Refined Depression Model")
```
Anxiety Models
===
Column {.tabset data-width=500}
---
### Anxiety Overview
Similarly to the depression models, three initial models were created to assess the prediction quality of almost all of the same variables from the previous section.
Again, this first model uses variables related to *physical health* issues:
- Allergy problems
- Asthma
- Chronic fatigue syndrome
- Diabetes
- Endometriosis
- Hepatitis B or C
- High blood pressure
- High cholesterol
- Repetitive stress injury
- Back pain
- Broken bone/fracture
- Bronchitis
- Ear infection
- Pelvic inflammatory disease
- Sinus infection
- Strep throat
The second relates to *behavioral health* related issues, which were identified in the same way:
- Anorexia
- Anxiety disorder (yes/no in the past 12 months)
- Bulimia
- Depression
- Genital herpes
- HIV infection
- Seasonal affective disorder
- Substance abuse problem
- Chlamydia
- Gonorrhea
- Mononucleosis
- PHQ-9 Score (continuous measure of depression severity)
All models also include sex, race, and age because we are interested in the demographic effects despite their significance. The main differences between the depression and anxiety models are the presence of the opposite clinical scoring scales in each model.
**Note:** HPV was not included in the models because there were too few cases and did not let the model run appropriately.
The third model combines all predictors from the first two models to explore additional possible significant variables.
The anxiety refined model is composed of every predictor that was significant at the 0.1 level in the first three models. More information on the refined model can be found in the tabs "Refined Model Interpretation," "Goodness of Fit & Adequacy," "Predictive Performance," and "ROC."
### Refined Model Interpretation
We fit a binary logistic regression model predicting the presence of a GAD-7 score greater than or equal to 10 (the standard cutoff for clinical intervention) from the following variables: sex, race, age(year_1), general health, and any experience in the last 12 months of endometriosis, chronic fatigue syndrome, anxiety, chlamydia, high cholesterol, or diabetes, ear infection, or depression.
The model was stratified into a training set of 70% and a test set of 30%. Due to imbalances, the training set was upsampled to better train the model. Original counts GAD-7 score >= 10 were 255 "No" and 153 "Yes" in the training set. The seed was set at 2626 for consistent split data. Details on data splitting and up-sampling can be found in the source code.
The following predictors were significant in the refined model:
**race(Asian):** OR = 0.13 (0.13 - 1 = -0.87 ~ -87%)
**race(Black):** OR = 0.23 (0.23 - 1 = -0.77 ~ -77%)
**race(Hispanic):** OR = 0.13 (0.13 - 1 = -0.87)
**PHQ-9 Score** OR = 1.66 ~ 166%
**general health (Poor)** OR = 1.65 ~ 165%
**endometriosis(12 months)** OR = 0.03 (0.03 - 1 = -0.03 ~ -97%)
**anxiety(12 months):** OR = 5.33 ~ 533%
**high cholesterol(12 months):** OR = 4.36 ~ 436%%
**ear infection(12 months):** OR = 5.01 ~ 501%
**diagnosed depression:** OR = 0.30 (0.30 - 1 = -0.70 ~ -70%)
Using a 0.5 probability cutoff on the test data, the model achieved accuracy of 80.2%, with 76.3% sensitivity and 82.5% specificity. The ROC curve yielded an AUC of 0.906, showing excellent discrimination between students with high and low levels of anxiety.
Column {.tabset data-width=500}
---
```{r, include = FALSE}
## 10 is the general cut off for clinical intervention
# yes/no considered to have anxiety
data <- data %>%
mutate(gad7_anxiety =
case_when(gad7_severity == "Mild Anxiety" |gad7_severity == "Minimal Anxiety" ~ "No",
gad7_severity == "Moderate Anxiety" | gad7_severity =="Severe Anxiety" ~ "Yes"))
data <- data %>%
mutate(gad7_anxiety01 =
case_when(gad7_anxiety == "Yes" ~ 1,
gad7_anxiety == "No" ~ 0))
data$gad7_anxiety <- as.factor(data$gad7_anxiety)
data$gad7_anxiety01 <- as.factor(data$gad7_anxiety01)
set.seed(2626)
train_index <- createDataPartition(data$gad7_anxiety, p = 0.8, list = FALSE)
train <- data[train_index,]
test <- data[-train_index,]
table(train$gad7_anxiety)
table(test$gad7_anxiety)
# up sampling the training data
anxietytrain_up <- upSample(x = train[, -which(names(train) == "gad7_anxiety01")],
y = train$gad7_anxiety01)
table(anxietytrain_up$Class)
names(anxietytrain_up)[ncol(anxietytrain_up)] <- "gad7_anxiety01"
table(anxietytrain_up$gad7_anxiety01)
table(test$gad7_anxiety01)
```
### Physical Health
Below is the first model including physical health measures and demographics.
```{r anxiety1}
anxiety_model1 <- glm(gad7_anxiety01 ~ sex + race + year_1 + acha_12months_any_allergy + acha_12months_any_asthma + acha_12months_any_back + acha_12months_any_blood +acha_12months_any_bronchitis + acha_12months_any_cholesterol +acha_12months_any_diabetes + acha_12months_any_ear +acha_12months_any_endometriosi +acha_12months_any_fatigure +acha_12months_any_hepatitis +acha_12months_any_pelvic +acha_12months_any_sinus+ acha_12months_any_strep +acha_12months_any_stressInjury + general_health, data = anxietytrain_up, family = binomial)
summary(anxiety_model1)
```
Sex, race, high cholesterol(12 months), diabetes(12 months), chronic fatigue syndrome(12 months), and general health were all significant and will be included in the refined model.
### Behavioral Health
Below is the behavioral health model, also including demographics.
```{r anxiety2}
anxiety_model2 <- glm(gad7_anxiety01 ~ sex + race + year_1 + general_health+ phq9_score + acha_depression + acha_12months_any_anorexia + acha_12months_any_anxiety + acha_12months_any_bulimia +acha_12months_any_depression +acha_12months_any_hpv +acha_12months_any_seasonal +acha_12months_any_substance +acha_12months_any_chlamydia +acha_12months_any_mononucleosis + acha_12months_any_gonorrhea + acha_12months_any_HIV, data = anxietytrain_up, family = binomial)
summary(anxiety_model2)
```
Race, PHQ-9 score, depression(12 months), anxiety(12 months), and chlamydia(12 months) were all significant and will be included in the refined model.
### Combined
Below is the third model that includes all predictors from the first and second models.
```{r anxiety3}
anxiety_model3 <- glm(gad7_anxiety01 ~ sex + race + year_1 +general_health + phq9_score + + acha_depression + acha_12months_any_anorexia + acha_12months_any_anxiety + acha_12months_any_bulimia +acha_12months_any_depression +acha_12months_any_hpv +acha_12months_any_seasonal +acha_12months_any_substance +acha_12months_any_chlamydia +acha_12months_any_mononucleosis + acha_12months_any_gonorrhea + acha_12months_any_HIV + acha_12months_any_allergy + acha_12months_any_asthma + acha_12months_any_back + acha_12months_any_blood +acha_12months_any_bronchitis + acha_12months_any_cholesterol +acha_12months_any_diabetes + acha_12months_any_ear +acha_12months_any_endometriosi +acha_12months_any_fatigure +acha_12months_any_hepatitis +acha_12months_any_pelvic +acha_12months_any_sinus+ acha_12months_any_strep +acha_12months_any_stressInjury, data = anxietytrain_up, family = binomial)
summary(anxiety_model3)
```
Race, general health, PHQ-9 score, depression(12 months), anxiety(12 months), high cholesterol(12 months), diabetes(12 months) ear infection(12 months), and endometriosis(12 months) were all significant and will be included in the refined model.
### Refined Model
```{r anxietyr}
anxiety_refined <- glm(gad7_anxiety01 ~ sex + race + year_1 + phq9_score + general_health + acha_12months_any_endometriosi + acha_12months_any_fatigure + acha_12months_any_anxiety + acha_12months_any_chlamydia + acha_12months_any_cholesterol + acha_12months_any_diabetes + acha_12months_any_ear + acha_depression , data = anxietytrain_up, family = binomial)
summary(anxiety_refined)
```
Below are ORs and confidence intervals for all predictors. Specific interpretations can be found in the “Refined Model Interpretation” tab.
```{r anxietyor}
exp(cbind(OR = coef(anxiety_refined),
confint(anxiety_refined)))
```
### Goodness of Fit & Adequacy
**Likeliehood Ratio Test**
```{r}
null_model <- glm(gad7_anxiety01 ~ 1, data = anxietytrain_up, family = binomial)
anova(null_model, anxiety_refined, test = "Chisq")
```
Our refined model depicts the probability of anxiety using the chosen predictors better than using the null-model. With a p-value of < 2.2 x 10^-16, we can conclude the that our model with the included variables provides substantial explanatory power for predicting clinical levels of anxiety.
**Pseudo-R^2**
```{r}
pR2(anxiety_refined)
```
The McFadden value is greater than 0.4 at 0.59, indicating an excellent fit compared to the null-model. The Cox & Snell value is 0.56, and the Nagelkerke value is 0.75, so the model achieves 75% of the maximum possible improvement in fit compared to the null. All three of these values indicate good fit of the model.
### Predictive Performance
```{r}
anxiety_test_prob <- predict(anxiety_refined, newdata = test, type ="response")
anxiety_test_pred <- ifelse(anxiety_test_prob >= 0.5, "1", "0") %>% factor(levels = levels(test$gad7_anxiety01))
cm_dep <- confusionMatrix(anxiety_test_pred, test$gad7_anxiety01, positive = "1")
cm_dep
```
Using a 0.5 probability cutoff on the test data, the model achieved accuracy of 80.2%, with 76.3% sensitivity and 82.5% specificity. The model successfully predicts the outcome of sever anxiety.
### ROC
```{r}
aroc_obj <- roc(response = test$gad7_anxiety01,
predictor = anxiety_test_prob,
levels = c("0","1"),
direction = ("<"))
plot(aroc_obj, print.auc = TRUE, legacy.axes = TRUE, main = "ROC Curve for Refined Anxiety Model")
```
Discussion
===
### Results
*Note:* All results should be understood as only significant when keeping the chosen predictors constant. Different models may suggest different results.
**Q1: Which health issues are related to depression and anxiety diagnoses?**
Using a binary logistic regression compiled of predictors found to be significant in preliminary models, we did not find any physical health issues to significantly contribute to depression in college students. There are previous studies that refute this claim, and preliminary models suggested there were some correlations between these factors. Future models may want to exclude highly correlated variables such as alternative anxiety and depression measures from the equation.
For the anxiety model, we found that those with poor health have significantly more anxiety compared to those with excellent health when holding all of these predictors constant. Additionally, those who reported experiencing high cholesterol, an ear infection, or any anxiety in the past 12 months are significantly more likely to have severe anxiety levels. Surprisingly, people with a high PHQ-9 score had a higher probablity while those officially diagnosed with depression were slightly less likely. These two variables measure similar items, but it would be interesting to further explore why this discrepancy exists. Finally, those who have experienced endometrioses are also less likely to report high anxiety measures.
This data is interesting to observe. Future research should explore what facotrs contribute to high cholesterol and what potential environmental factors cause ear infections. It would be interesting to see if there are any underlying connections between these health issues and generalized anxiety disorder.
**Q2: Are mental and physical health problems consistent across sex, race, and age?**
According to the model those who only identify as, asian, black, hispanic, and people who identify as other races are less likely to be diagnosed with depression than people who only identify as white. This is an interesting finding, but also worth noting that a majority of the sample was white and this may affect the outcome. Additionally, as age increases by year, people are more likely to be diagnosed with depression. Because the age variable is coded by the year they were born, it is more difficult to interpret off hand. There was no significant difference between sexes.
Again, asian, black, and hispanic people were less likely to report high levels of anxiety than white people. Age and sex were not significant in the refined model. However, it is worth noting that males appeared less likely to report severe anxiety in general, and this was significant in the first model.
**Future Directions**
Because all health data was based on binary variables, future studies may want to explore more continuous measures of health. This could offer more insight into the correlation of mental health and physical health problems if there is more information that just yes or no.
### References
Braghieri, L., Levy, R., & Makarin, A. (2021). Social media and mental health. *SSRN Electronic Journal, 112*(11). https://doi.org/10.2139/ssrn.3919760
Kroenke, K., Spitzer, R. L., & Williams, J. B. W. (2001). The PHQ-9: Validity of a brief depression severity measure. *Journal of General Internal Medicine, 16*(9), 606–613. https://doi.org/10.1046/j.1525-1497.2001.016009606.x
Spitzer, R. L., Kroenke, K., Williams, J. B. W., & Löwe, B. (2006). A brief measure for assessing generalized anxiety disorder: the GAD-7. *Archives of Internal Medicine, 166*(10), 1092–1097. https://doi.org/10.1001/archinte.166.10.1092
*Spring 2008 reference group executive summary.* (n.d.). Retrieved September 20, 2025, from https://www.acha.org/wp-content/uploads/2024/07/ACHA-NCHA_Reference_Group_ExecutiveSummary_Spring2008.pdf
**Note:** ChatGPT was utilized to help create up-sampling code and identify output errors.