Field Sobriety Tests: Legal Standards and Admissibility
Field sobriety tests (FSTs) occupy a contested position in American DUI enforcement: they form the evidentiary bridge between a traffic stop and a chemical test, yet their scientific validity remains disputed in courts and academic literature. This page covers the three standardized tests endorsed by the National Highway Traffic Safety Administration (NHTSA), the legal standards governing their administration and admissibility, the factors that affect reliability, and how courts have treated challenges to FST evidence. The material applies to the national framework while acknowledging that DUI laws vary by state.
- Definition and scope
- Core mechanics or structure
- Causal relationships or drivers
- Classification boundaries
- Tradeoffs and tensions
- Common misconceptions
- Checklist or steps (non-advisory)
- Reference table or matrix
- References
Definition and scope
Field sobriety tests are structured physical and cognitive assessments administered roadside by law enforcement officers to detect signs of impairment. In the United States, their legal significance derives from a 1970s–1980s research program funded by NHTSA, which produced the Standardized Field Sobriety Test (SFST) battery — the only FST battery with validated, published accuracy data traceable to a named federal agency.
NHTSA's SFST program standardized three tests: the Horizontal Gaze Nystagmus (HGN), the Walk-and-Turn (WAT), and the One-Leg Stand (OLS). The agency publishes training manuals — the most recent editions of the SFST Participant Manual are publicly available through NHTSA's traffic safety programs — establishing exact administration protocols that officers must follow for results to carry evidentiary weight (NHTSA SFST resources).
FSTs do not measure blood alcohol concentration (BAC) directly. They generate behavioral indicators that an officer uses to establish probable cause for a chemical test or a custodial arrest. Under Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993), federal courts evaluate expert testimony on FST validity under a reliability standard; state courts apply either Daubert or the older Frye general-acceptance standard, depending on jurisdiction.
Core mechanics or structure
Horizontal Gaze Nystagmus (HGN)
HGN measures involuntary eye jerking (nystagmus) as the eye tracks a moving stimulus held by the officer. Alcohol and certain drugs impair the smooth-pursuit pathway, causing nystagmus onset at angles below 45 degrees. Officers score up to 6 clues across both eyes. NHTSA's validation studies found HGN correctly classified subjects at or above 0.08% BAC approximately 88 percent of the time when administered correctly (NHTSA Development and Field Test of Psychophysical Tests for DWI Arrest, DOT HS 805 864, 1981).
Walk-and-Turn (WAT)
The WAT is a divided-attention test. A subject walks 9 heel-to-toe steps along a real or imaginary line, turns in a specified manner, and returns. Officers score 8 possible clues including stepping off the line, using arms for balance, and an improper turn. NHTSA's research associated WAT with approximately 79 percent accuracy at the 0.08% BAC threshold (DOT HS 805 864).
One-Leg Stand (OLS)
OLS requires balancing on one foot with the other raised approximately 6 inches off the ground while counting aloud for 30 seconds. Officers score 4 clues. The same NHTSA validation study reported approximately 83 percent accuracy at 0.08% BAC.
Non-standardized tests — finger-to-nose, Romberg balance, reciting the alphabet backward — have no NHTSA-validated accuracy data and are treated differently by courts evaluating admissibility.
Causal relationships or drivers
The physiological basis of SFST validity rests on alcohol's dose-dependent disruption of the cerebellum (balance, coordination) and the smooth-pursuit eye movement system (HGN). Central nervous system depressants beyond ethanol — including benzodiazepines and certain antihistamines — produce similar gaze and balance disruption, which is why FST clues are not exclusive indicators of alcohol. This intersection is directly relevant to drug DUI prosecutions, where officers may administer a Drug Evaluation and Classification (DEC) protocol after SFSTs.
Officer training level is a documented driver of result quality. NHTSA requires standardized 24-hour practitioner training for officers who will administer and testify about SFSTs. Deviations from NHTSA-specified instructions — such as not providing standardized verbal instructions or conducting HGN in flashing light conditions — introduce error that courts may treat as grounds for suppression. This connects to evidence suppression motions that challenge foundational administration errors.
Environmental factors — uneven pavement, poor lighting, wind, temperature — affect WAT and OLS performance independently of impairment. Subject factors including age over 65, weight more than 50 pounds over ideal, leg injuries, and inner-ear disorders are identified in NHTSA's own manuals as conditions that may produce false-positive clues on balance tests.
Classification boundaries
FSTs divide into two legally distinct categories:
Standardized (NHTSA-validated): HGN, WAT, OLS. These carry the strongest presumption of admissibility because they are tied to published validation research and a federal training standard. Courts in the majority of states have ruled them admissible as lay observations of an officer (not requiring expert-level scientific foundation) when properly administered.
Non-standardized: All other roadside tests including the Romberg test, finger dexterity tests, and reverse-alphabet recitation. These lack NHTSA validation data. Courts treat them as observations of behavior, useful to establish probable cause but generally insufficient to generate expert-level accuracy testimony.
A further classification boundary applies within HGN: courts in some jurisdictions limit HGN testimony to probable cause for arrest and prohibit officers from stating a numerical BAC estimate based on HGN clues alone, because that claim exceeds the validated scope of the test. States such as Minnesota and Washington have appellate decisions specifically addressing this limitation.
The implied consent framework governs chemical tests but not FSTs. Refusal to perform FSTs carries no automatic statutory penalty in most states, unlike refusal of a breath or blood test, though refusal can be introduced as evidence in the criminal proceeding.
Tradeoffs and tensions
Sensitivity vs. specificity. NHTSA's validation data report accuracy in terms of correct classification above a BAC threshold — not false-positive rates for sober subjects. A test optimized to detect impaired drivers at 0.08% BAC may produce false positives for sober individuals with neurological conditions or physical limitations.
Lay testimony vs. scientific evidence. Courts have allowed officers to testify about FST clues as lay observations under Federal Rule of Evidence 701, bypassing Daubert scrutiny. Defense challenges typically argue that HGN in particular requires scientific foundation because the nystagmus mechanism is physiological, not behavioral. Jurisdictions are split on whether HGN requires expert testimony or qualifies as lay observation.
Original validation methodology. The 1981 and 1983 NHTSA studies were conducted under controlled conditions with known BAC levels and are the foundation of all SFST accuracy claims. Critics, including researchers documented in regulatory sources, have noted that the original sample sizes were small (fewer than 300 subjects in the combined original validation), and that real-world conditions differ from controlled lab settings. The NHTSA materials do not publish confidence intervals alongside the headline accuracy figures.
Jurisdictional variation. Because FST admissibility is a state-law issue in criminal proceedings, the standard of admissibility — Daubert, Frye, or a state variant — varies. This means identical evidence may be handled differently depending on jurisdiction, creating tension with the national NHTSA training standard. The federal vs. state DUI jurisdiction framework governs which standard applies in a given case.
Common misconceptions
Misconception: Passing FSTs guarantees no arrest.
FST performance is one factor in an officer's probable-cause determination. An officer can proceed to a chemical test request based on other observations (odor, statements, driving behavior) regardless of FST performance.
Misconception: FSTs measure BAC.
FSTs generate behavioral clues correlated with BAC at a population level. They do not produce a numerical BAC reading. Only blood alcohol concentration tests (breath, blood, urine) yield a quantitative result.
Misconception: Refusing FSTs avoids all consequences.
Unlike chemical test refusal under implied consent statutes, FST refusal does not trigger automatic license suspension in most states. However, the refusal is typically admissible as consciousness-of-guilt evidence at trial.
Misconception: HGN results are infallible.
At least 47 conditions other than alcohol are documented to cause nystagmus, including caffeine in large doses, certain prescription medications, and inner-ear disorders. NHTSA's own manuals list pathological nystagmus as a variable that officers must rule out before scoring HGN clues.
Misconception: Non-standardized tests are equivalent to NHTSA tests.
No published federal agency validation data supports accuracy claims for non-standardized tests. Courts consistently distinguish them from the SFST battery when evaluating admissibility of expert quantitative testimony.
Checklist or steps (non-advisory)
The following outlines the sequential administrative steps NHTSA protocols require for a valid SFST administration. Deviations at any step are the basis of legal challenges to admissibility.
HGN Administration Sequence
1. Check subject for hard contact lenses; document finding.
2. Position subject facing away from flashing emergency lights.
3. Hold stimulus 12–15 inches from subject's face, slightly above eye level.
4. Check for equal pupil size and resting nystagmus before scoring.
5. Move stimulus at a rate requiring approximately 2 seconds per pass across the full field of vision.
6. Score lack of smooth pursuit in each eye (2 clues).
7. Score distinct and sustained nystagmus at maximum deviation in each eye (2 clues).
8. Score onset of nystagmus prior to 45 degrees in each eye (2 clues).
9. Document total clue count (maximum 6); 4 or more clues indicates NHTSA threshold for impairment inference.
WAT Administration Sequence
1. Identify or designate a line (real or imaginary); ensure surface is reasonably flat.
2. Deliver standardized verbal instructions while demonstrating.
3. Confirm subject understands before beginning.
4. Score 8 designated clues during the instructional phase and walking phase.
5. Document clue count; 2 or more clues indicates NHTSA threshold.
OLS Administration Sequence
1. Confirm subject has no injury, medical condition, or footwear that NHTSA identifies as exclusionary.
2. Deliver standardized instructions with demonstration.
3. Time 30 seconds using a watch or internal count; do not prompt subject on time elapsed.
4. Score 4 designated clues (swaying, using arms, hopping, putting foot down).
5. Document clue count; 2 or more clues indicates NHTSA threshold.
Reference table or matrix
| Test | NHTSA Reported Accuracy (0.08% BAC threshold) | Clues Scored | Documented Confounders | Admissibility Standard Most Jurisdictions |
|---|---|---|---|---|
| Horizontal Gaze Nystagmus (HGN) | ~88% (DOT HS 805 864) | 6 (3 per eye) | 47+ nystagmus-causing conditions, medications, inner-ear disorders | Lay observation in majority; some states require expert foundation |
| Walk-and-Turn (WAT) | ~79% (DOT HS 805 864) | 8 | Age >65, weight >50 lbs over ideal, leg injuries, uneven surface | Lay observation; admissible with proper administration |
| One-Leg Stand (OLS) | ~83% (DOT HS 805 864) | 4 | Same physical factors as WAT, inner-ear disorders | Lay observation; admissible with proper administration |
| Non-standardized (e.g., Romberg, reverse alphabet) | No NHTSA validation data | Varies | Not systematically studied | Behavioral observation only; no expert accuracy testimony supported |
| Combined SFST Battery (all three) | ~91% reported in NHTSA field studies | Up to 18 | Cumulative confounders from each individual test | Strongest admissibility posture when all three properly administered |
Accuracy figures are from NHTSA's foundational validation research documents and should be evaluated in light of methodological critiques noted in the Tradeoffs section.
References
- NHTSA — Drunk Driving / Standardized Field Sobriety Testing
- NHTSA — SFST Practitioner and Instructor Manuals (Traffic Safety Programs)
- NHTSA DOT HS 805 864 — Development and Field Test of Psychophysical Tests for DWI Arrest (1981)
- Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993)
- Federal Rules of Evidence, Rule 701 (Lay Opinion Testimony)
- Federal Rules of Evidence, Rule 702 (Expert Testimony)
- NHTSA Drug Evaluation and Classification (DEC) Program
- Cornell Law School Legal Information Institute — Probable Cause