PHQ-9 vs GAD-7: Depression & Anxiety Screening Guide 2026

In the landscape of primary care and mental health services, rapid detection of psychological distress is critical. Two instruments have risen to become the “gold standards” for this purpose: the Patient Health Questionnaire-9 (PHQ-9) and the Generalized Anxiety Disorder-7 (GAD-7) scale. These tools are ubiquitous in clinical settings due to their alignment with diagnostic criteria, brevity, and public domain availability.

However, while they are often administered together, they measure distinct constructs with unique psychometric properties and scoring nuances. For clinicians, researchers, and healthcare administrators, understanding the mechanical and theoretical differences between PHQ-9 vs. GAD-7 is essential for accurate diagnosis and treatment planning. This guide provides a comprehensive analysis of their origins, structural differences, scoring guidelines, and clinical utility.

Prefer listening over reading?

Historical Development and Origins

To understand why these tools are structured the way they are, one must look at their developmental lineage. Both scales trace their roots to the PRIME-MD (Primary Care Evaluation of Mental Disorders), a diagnostic tool developed in the mid-1990s by a research team including Drs. Robert Spitzer, Janet Williams, and Kurt Kroenke.

From PRIME-MD to Self-Report

The original PRIME-MD was a two-stage tool involving a patient questionnaire followed by a clinician-led interview. While effective, it was deemed too time-consuming for the rapid pace of general practice. This necessity drove the evolution of the Patient Health Questionnaire (PHQ), a fully self-administered version. The PHQ originally contained modules for multiple disorders, but the mood module—which became the PHQ-9—emerged as the standout tool for Major Depressive Disorder (MDD).

Subsequently, the GAD-7 was developed to address a gap in efficiently screening for Generalized Anxiety Disorder (GAD). Unlike the PHQ-9, which was extracted from the larger PHQ, the GAD-7 was constructed by evaluating a pool of symptom items to find the most predictive markers for generalized anxiety.

The Commercial Context

It is also worth noting the socio-commercial history of these tools. The development of the PHQ-9 was funded by Pfizer in the early 1990s. This coincided with the marketing push for Zoloft (sertraline), a Selective Serotonin Reuptake Inhibitor (SSRI). The goal was to provide primary care physicians—who were often hesitant to diagnose mental illness—with a simple checklist to identify depressed patients, thereby reducing diagnostic uncertainty. Despite these commercial origins, the scientific creators have maintained that the validation research was independent, and the tools have since been validated in thousands of independent studies globally.

Understanding the PHQ-9 Depression Screening Tool

While both tools utilize a Likert-scale format assessing symptoms over the “last 2 weeks,” their structural architecture targets different diagnostic criteria.

The PHQ-9 is unique because its nine items map exactly to the nine diagnostic criteria for Major Depressive Disorder outlined in the DSM-IV (and retained in DSM-5). These items assess three specific domains of depression:

Affective/Cardinal Symptoms

Anhedonia (little interest or pleasure in doing things) and depressed mood (feeling down, depressed, or hopeless). These two symptoms are considered the cardinal signs of depression, and at least one must be present for a diagnosis of major depressive disorder.

Somatic/Physical Symptoms

Sleep disturbances (trouble falling asleep, staying asleep, or sleeping too much), fatigue (feeling tired or having little energy), appetite changes (poor appetite or overeating), and psychomotor agitation/retardation (moving or speaking slowly, or being restless).

Cognitive Symptoms

Feelings of guilt or failure (feeling bad about yourself or that you’re a failure), trouble concentrating (difficulty making decisions), and suicidal ideation (thoughts that you would be better off dead or of hurting yourself).

Because of this direct mapping, the PHQ-9 can function as both a severity measure and a diagnostic algorithm. If a patient endorses five or more items (including at least one cardinal symptom) for “more than half the days,” it suggests a probable diagnosis of MDD.

Understanding the GAD-7 Anxiety Screening Tool

The GAD-7 focuses on the construct of “worry” and physiological tension. While it was designed specifically for Generalized Anxiety Disorder, it has shown efficacy in detecting other anxiety conditions such as panic disorder, social phobia, and PTSD.

Factor analysis of the GAD-7 often reveals a unidimensional structure (measuring general anxiety), but some psychometric studies identify two distinct factors:

Cognitive-Emotional Factors

Items related to nervousness, inability to stop worrying, worrying too much about different things, and fear that “something awful might happen”. These cognitive-emotional components represent the core psychological experience of anxiety, particularly the uncontrollable nature of worry.

Somatic Tension Factors

Items related to trouble relaxing, restlessness (being so restless that it is hard to sit still), and irritability (becoming easily annoyed or irritable). These somatic components reflect the physical manifestations of anxiety and physiological tension.

Unlike the PHQ-9, which covers specific vegetative symptoms like appetite and sleep duration, the GAD-7 focuses more heavily on the inability to relax and the uncontrollable nature of worry.

Psychometric Properties and Diagnostic Accuracy

The clinical utility of any screening tool relies on its diagnostic accuracy—specifically its sensitivity (ability to detect true cases) and specificity (ability to rule out non-cases).

PHQ-9 Diagnostic Accuracy

The PHQ-9 is widely considered to have robust psychometric properties. A standard cut-off score of ≥10 typically yields a sensitivity of 88% and a specificity of 88% for detecting Major Depression. This balance makes it an ideal “yellow flag” for clinicians.

However, context matters. In specific populations, such as hospitalized patients, the optimal cut-off may shift. For example, a cross-sectional study in a Peruvian hospital found that a lower cut-off of ≥7 optimized the balance of sensitivity (76%) and specificity (72%). This suggests that in populations with high physical comorbidities, lower scores may still indicate clinically significant depression due to somatic symptom overlap.

GAD-7 Diagnostic Accuracy

The GAD-7 also demonstrates strong validity. The original validation study suggested a cut-off of ≥10 offered 89% sensitivity and 82% specificity for GAD.

However, subsequent meta-analyses have suggested that a cut-off of ≥8 might be superior for optimizing sensitivity without significantly compromising specificity. For detecting any anxiety disorder (including panic and social anxiety), the GAD-7 has an Area Under the Curve (AUC) of roughly 0.90, indicating high diagnostic accuracy.

PHQ-9 Scoring System and Interpretation

Both tools use a frequency scale ranging from 0 (“Not at all”) to 3 (“Nearly every day”). The total score is a summation of these responses. Understanding the severity thresholds is crucial for clinical decision-making.

The PHQ-9 total score ranges from 0 to 27. The interpretation zones are generally defined as follows:

0–4: Minimal depression – Monitoring; no immediate treatment
5–9: Mild depression – Watchful waiting; clinical judgment required
10–14: Moderate depression – Treatment plan considerations; counseling or pharmacotherapy
15–19: Moderately severe depression – Active treatment recommended
20–27: Severe depression – Immediate initiation of pharmacotherapy/psychotherapy usually indicated

PHQ-9 scores of 5, 10, 15, and 20 represent valid and easy-to-remember thresholds demarcating the lower limits of mild, moderate, moderately severe, and severe depression.

Critical Clinical Note on Question 9

Question 9 assesses “Thoughts that you would be better off dead or of hurting yourself in some way”. Regardless of the total score, a positive response to this item requires immediate suicide risk assessment.

Research shows that patients with any level of suicidal ideation (PHQ-9 item 9 >0) were 3-to-7 times more likely to die by suicide in the next 30 days, and 2-to-4 times as likely to die by suicide in the following year. Those reporting nearly daily thoughts of self-harm were 3.3- to 10.8-times more likely to die by suicide within 30 days.

GAD-7 Scoring System and Interpretation

The GAD-7 total score ranges from 0 to 21. The severity strata are:

0–4: Minimal anxiety
5–9: Mild anxiety
10–14: Moderate anxiety – Clinically significant; further evaluation recommended
15–21: Severe anxiety

A score of 10 or greater is the standard “yellow flag” for probable anxiety disorder, though as noted, a score of 8 may warrant clinical inquiry. When screening for anxiety disorders, a score of 8 or greater represents a reasonable cut-point for identifying probable cases of generalized anxiety disorder, and further diagnostic assessment is warranted to determine the presence and type of anxiety disorder.

Cut points of 5, 10, and 15 represent mild, moderate, and severe levels of anxiety on the GAD-7, similar to levels of depression on the PHQ-9. Research shows there is a strong association between increasing GAD-7 severity scores and worsening functional status, with substantial stepwise decline in functioning as scores move from mild to moderate to severe.

The Combined Approach: PHQ-ADS

Depression and anxiety are highly comorbid conditions. Studies indicate that nearly 50% of patients with depression also meet criteria for an anxiety disorder, and vice versa. The prevalence of comorbid anxiety disorder and major depressive disorder (MDD) may be as high as 60%, far greater than the 2% or less co-occurrence that would be expected by chance. Because of this overlap, treating them as entirely separate entities can be clinically inefficient.

To address this, researchers validated the PHQ-ADS (Patient Health Questionnaire Anxiety-Depression Scale). This is a composite measure that simply sums the PHQ-9 and GAD-7 scores, resulting in a range of 0 to 48.

Why Use the PHQ-ADS?

The PHQ-ADS acknowledges the “synergy” of distress. Many treatments, such as SSRIs and Cognitive Behavioral Therapy (CBT), are transdiagnostic—they treat both conditions simultaneously. Monitoring a single composite score can simplify outcome tracking.

PHQ-ADS Cut-offs

PHQ-ADS cutpoints of 10, 20, and 30 indicated mild, moderate, and severe levels of depression/anxiety, respectively. This results in four ordinal PHQ-ADS categories:

0–9: Minimal distress
10–19: Mild distress
20–29: Moderate distress
30–48: Severe distress

Psychometric validation shows the PHQ-ADS has high internal reliability (Cronbach’s alpha of 0.8 to 0.9) and strong convergent validity, making it a robust tool for measuring the total burden of emotional illness. The PHQ-ADS composite score does not override the value of the individual PHQ-9 depression and GAD-7 anxiety scores but instead complements them as a measure of overall psychological symptomatology.

Ultra-Brief Screening: PHQ-2 and GAD-2

In high-volume settings like emergency departments or busy primary care clinics, administering full scales to every patient is not always feasible. This led to the development of the “ultra-brief” screeners: the PHQ-2 and GAD-2.

PHQ-2: The Depression Filter

The PHQ-2 consists of the first two items of the PHQ-9:

Little interest or pleasure in doing things (Anhedonia)
Feeling down, depressed, or hopeless (Depressed Mood)

Scoring: A score of ≥3 on the PHQ-2 has a sensitivity of 83% and specificity of 92% for detecting major depression. However, some studies suggest that using a lower threshold of ≥2 may improve sensitivity (86-96%) while maintaining reasonable specificity (78-82%), particularly in primary care populations. A positive score should trigger the administration of the full PHQ-9 to determine severity.

GAD-2: The Anxiety Filter

Similarly, the GAD-2 uses the first two items of the GAD-7:

Feeling nervous, anxious, or on edge
Not being able to stop or control worrying

Scoring: A cut-off of ≥3 is generally used to identify probable anxiety disorders, with a sensitivity of 86% and specificity of 83%. Research in cardiovascular inpatients found that the GAD-2 threshold of ≥3 provided the best balance between sensitivity and specificity, with acceptable rates of false positives and false negatives. Like the PHQ-2, a positive result necessitates the full GAD-7 assessment.

These brief tools are efficient filters, significantly reducing patient burden while maintaining high diagnostic accuracy for “ruling out” healthy individuals. The GAD-2 is particularly good at ruling out anxiety—over 95% of people who score 0-2 don’t have generalized anxiety disorder.

Clinical Limitations and Considerations

While the PHQ-9 and GAD-7 are invaluable, they are not without limitations. Blind reliance on scores without clinical context can lead to overdiagnosis or mismanagement.

Somatic Overlap and Physical Illness

A major challenge in medical settings is the overlap between somatic symptoms of depression/anxiety and symptoms of physical illness.

Oncology and Palliative Care: Symptoms like fatigue, sleep disturbance, and appetite changes (PHQ-9 items 3, 4, and 5) are common side effects of cancer and chemotherapy. However, research in cancer populations shows that false positives due to overlapping somatic symptoms are likely rare—analysis of persons who screened positive on the PHQ-9 showed few if any people screened positive on somatic symptoms alone. More commonly, people screening positive had a combination of somatic and psychological symptoms.

Respiratory Conditions: In patients with COPD, somatic anxiety symptoms (like trouble relaxing or restlessness) measured by the GAD-7 can be confounded by respiratory distress. However, studies show GAD-7 remains a valid tool in these populations if cut-offs are adjusted—research in Chinese COPD patients found that using a cut-off of ≥4 optimized the balance with a sensitivity of 66.0% and a specificity of 89.2%.

Cultural Variance and Cut-off Calibration

The standard cut-offs (10 for moderate) are not universally applicable across all cultures.

Latin American Populations: Research in Peru suggests that lower cut-offs (≥7 or ≥8) may be more accurate for hospitalized populations compared to the standard 10. The PHQ-9’s ≥7 cut-off point showed the highest simultaneous sensitivity (76%) and specificity (72%) when contrasted against a psychiatric diagnosis of depression in Peruvian hospital settings.

Cross-Cultural Considerations: Several studies in populations from low-income and middle-income countries have reported cut-offs between 5 and 7. One reason for the difference in cut-off points between high-income and low-income countries may be due to cultural factors, as culturally diverse groups do not achieve invariance between the PHQ-9 and the GAD-7.

The “Referral Gap”

A validated score does not guarantee treatment. A study in ophthalmic care found that while 72% of patients scored above critical thresholds on these scales, only 14% were referred for psychological evaluation. This highlights that screening tools are only as effective as the clinical pathways established to act on them.

Not Diagnostic

It is imperative to remember that PHQ-9 and GAD-7 are screening tools, not diagnostic instruments. A high score indicates a high probability of a disorder, but a definitive diagnosis requires a clinical interview to rule out differentials like bipolar disorder, substance use, or bereavement.

Digital Health Implementation and Future Directions

The future of these tools lies in digital integration. Modern healthcare systems are moving beyond static paper forms toward adaptive testing.

CART Models and Algorithms

Recent research using Classification and Regression Tree (CART) models has shown that we can predict severity with even fewer items than the full scales. For example, a study involving over 20,000 participants found that just two or three specific items from the PHQ-9 or GAD-7 could classify “severe” vs. “minimal” cases with over 85% accuracy.

The CART models produced concise, high-performing decision rules—using only 2 items for the GAD-7 and 3 for the PHQ-9. For GAD-7, the models achieved an accuracy of 86.1% for minimal or mild severity and 85.1% for severe cases, with both categories showing AUC values above 0.900. These findings confirmed that a small number of items could produce high-performing rules, particularly for identifying minimal/mild and severe cases.

Adaptive Testing Methodologies

These algorithmic approaches allow digital health platforms to use “decision rules”. For instance, if a patient reports low scores on the core GAD-2 items, the algorithm might skip the remaining questions, whereas high scores trigger the full assessment. This “adaptive” approach reduces respondent fatigue while maintaining diagnostic precision.

Machine learning approaches using multi-layer perceptron (MLP) models have achieved mean absolute errors as low as 5.06 for depression and 5.39 for anxiety prediction, while random forest and LightGBM models achieved F1-scores above 0.89 for anxiety classification. Additionally, recurrent neural network (RNN) models can predict reliable improvement in PHQ-9 and GAD-7 scores with above 87% accuracy and 0.89 AUROC after three or more review periods.

Electronic Health Records Integration

EHRs have the potential to improve adherence to clinical guidelines and increase care quality by integrating mental health screening tools directly into clinical workflows. Examples include improved prescribing practices through integrated electronic ordering systems and reductions in inappropriate interventions because of integrated decision-support tools.

However, challenges remain—mental health–related information is regularly missing from EHRs, especially sensitive information. Studies show that events for mental health patients were less likely to be recorded in EMRs compared with other types of patients. Despite these challenges, alert systems have been shown to increase the number of completed mental health safety plans and reduce the amount of missing data.

Feature	PHQ-9	GAD-7
Primary Focus	Depression screening	Anxiety screening
Number of Items	9 questions	7 questions
Score Range	0-27	0-21
DSM Alignment	Maps exactly to DSM-5 MDD criteria	Designed for Generalized Anxiety Disorder
Time Frame	Last 2 weeks	Last 2 weeks
Core Symptoms Measured	Anhedonia, depressed mood, somatic symptoms (sleep, appetite, fatigue), cognitive symptoms (guilt, concentration, suicidal ideation)	Nervousness, uncontrollable worry, restlessness, trouble relaxing, irritability, fear
Minimal Severity	0-4	0-4
Mild Severity	5-9	5-9
Moderate Severity	10-14	10-14
Moderately Severe/Severe	15-19 (moderately severe), 20-27 (severe)	15-21 (severe)
Standard Cut-off	≥10 (88% sensitivity, 88% specificity)	≥10 (89% sensitivity, 82% specificity)
Ultra-Brief Version	PHQ-2 (first 2 items, cut-off ≥3)	GAD-2 (first 2 items, cut-off ≥3)
Critical Item	Question 9 (suicidal ideation requires immediate assessment regardless of total score)	None specifically flagged
Combined Score	PHQ-ADS (PHQ-9 + GAD-7): 0-48 total range

The PHQ-9 and GAD-7 have transformed mental health screening from a subjective art into a quantifiable science. The PHQ-9 rigorously maps to DSM depression criteria, while the GAD-7 captures the core essence of generalized worry. Their combined use, potentially via the PHQ-ADS, addresses the reality that depression and anxiety rarely travel alone.

However, their ubiquity should not lead to complacency. Clinicians must be aware of the “grey zones” in scoring, the potential for somatic inflation in the medically ill, and the need for cultural calibration. Whether using the full scales or the ultra-brief PHQ-2/GAD-2 versions, these tools are most powerful when used as the starting point for a clinical conversation, rather than the final word on a patient’s mental health.

The integration of these screening tools into digital health platforms through CART models and adaptive testing represents the future of efficient, personalized mental health assessment. As healthcare systems continue to evolve, the PHQ-9 and GAD-7 will remain foundational tools—not as replacements for clinical judgment, but as standardized entry points that facilitate earlier detection, more consistent monitoring, and ultimately better outcomes for patients experiencing depression and anxiety.

Frequently Asked Questions

What is the main difference between PHQ-9 and GAD-7?

The PHQ-9 screens for depression and measures nine symptoms including low mood, sleep problems, and suicidal thoughts. The GAD-7 screens for anxiety and focuses on worry, nervousness, and inability to relax. PHQ-9 maps to DSM depression criteria, while GAD-7 assesses generalized anxiety and related disorders.

What is a normal PHQ-9 score?

A PHQ-9 score of 0-4 indicates minimal or no depression. Scores of 5-9 suggest mild depression, while 10 or higher indicates moderate to severe depression requiring clinical attention. Any positive response to Question 9 about suicidal thoughts requires immediate evaluation regardless of total score.

What is a good GAD-7 score?

A GAD-7 score of 0-4 indicates minimal anxiety and is considered normal. Scores of 5-9 suggest mild anxiety, while scores of 10 or higher indicate moderate to severe anxiety warranting further evaluation.

Can PHQ-9 and GAD-7 be used together?

Yes, PHQ-9 and GAD-7 are commonly used together since depression and anxiety are highly comorbid, with nearly 50% of patients having both conditions. The combined PHQ-ADS score (0-48) can track overall emotional distress and is useful for monitoring transdiagnostic treatments.

What is PHQ-2 and GAD-2?

PHQ-2 and GAD-2 are ultra-brief versions using only the first two questions from each scale. They’re efficient screening tools for busy clinical settings with 83-86% sensitivity. A score of ≥3 on either indicates the need for the full assessment.

Are PHQ-9 and GAD-7 free to use?

Yes, both PHQ-9 and GAD-7 are public domain tools that can be used freely without licensing fees. This widespread availability has made them the gold standard screening tools in primary care and mental health settings globally.

How accurate are PHQ-9 and GAD-7?

PHQ-9 has 88% sensitivity and 88% specificity for detecting major depression at a cut-off of ≥10. GAD-7 shows 89% sensitivity and 82% specificity for generalized anxiety disorder. Both demonstrate high diagnostic accuracy with AUC values around 0.90.

Can these tools diagnose depression or anxiety?

No, PHQ-9 and GAD-7 are screening tools, not diagnostic instruments. High scores indicate probable depression or anxiety, but diagnosis requires a comprehensive clinical interview to rule out other conditions like bipolar disorder or substance use.

Do cut-off scores differ by population?

Yes, optimal cut-offs vary by population and cultural context. While ≥10 is standard in Western settings, studies in Latin American and Asian populations suggest lower cut-offs (≥7-8) may be more accurate. Medical populations may also benefit from adjusted thresholds.

How is PHQ-9 Question 9 scored?

PHQ-9 Question 9 addresses suicidal ideation and is scored 0-3 like other items. However, any score >0 requires immediate suicide risk assessment regardless of total PHQ-9 score. Patients endorsing this item are 3-10 times more likely to die by suicide within 30 days.

PHQ-9 vs GAD-7: Depression and Anxiety Screening Guide 2026