Est. reading time: 11 min read

Survey Data Quality: A Practical Checklist Before You Analyze

data qualitysurvey analysisdata cleaningresearch methodologybest practices

Pre-analysis survey data quality screening: straightlining, speeding, inconsistency, and other satisficing indicators. A systematic checklist for cleaner datasets.

Survey Data Quality: A Practical Checklist Before You Analyze

The most dangerous data isn't missing data—it's bad data that looks complete.

Survey data quality problems don't announce themselves. Straightliners, speeders, and inattentive respondents submit responses that look like everyone else's. Their data passes basic checks, gets included in your analysis, and quietly distorts your conclusions.

This checklist helps you identify quality issues before analysis. Think of it as quality control for survey data: a systematic review that catches problems human reviewers miss.

TL;DR:

  • Check for speeders: Responses completed impossibly fast indicate no engagement.
  • Check for straightliners: Identical responses across grid questions suggest satisficing.
  • Check for inconsistent responses: Contradictory answers indicate inattention or confusion.
  • Check attention check failures: Failed trap questions confirm disengagement.
  • Check open-text quality: Gibberish, copy-paste, or off-topic responses signal problems.
  • Document your decisions: Whatever you exclude, document why and how it affects results.

→ Build Quality-First Surveys with Lensym

Why Data Quality Checks Matter

Every dataset contains some low-quality responses. The question is whether they're random noise (which averages out) or systematic contamination (which biases results).

Low-quality responses typically come from:

  • Satisficers: Respondents giving minimal effort to finish quickly
  • Bots: Automated responses in online panels
  • Fraudulent respondents: People gaming incentive systems
  • Confused respondents: People who misunderstood questions or instructions
  • Fatigued respondents: People who started engaged but gave up mentally

Without quality checks, these responses are weighted equally with thoughtful ones. A dataset with a substantial proportion of low-quality responses will produce conclusions contaminated by that noise—the more careless responses, the more distorted your findings.

The Data Quality Checklist

1. Speeder Detection

What it is: Respondents who complete the survey faster than reasonably possible.

Why it matters: Reading and considering questions takes time. Someone who finishes a 10-minute survey in 2 minutes didn't read the questions.

How to check:

  1. Calculate median completion time for your survey
  2. Flag responses below a threshold (common: 1/3 of median time, or 2 standard deviations below mean)
  3. Review flagged responses for other quality indicators

Thresholds:

Survey Length Minimum Reasonable Time
5 minutes 1.5-2 minutes
10 minutes 3-4 minutes
15 minutes 5-6 minutes
20 minutes 7-8 minutes

What to do if you see this:

  • Exclude extreme speeders (below minimum reasonable time)
  • Flag moderate speeders for additional review
  • Consider whether speeding correlates with other quality issues

Caution: Some legitimate respondents are fast readers or have simple situations (e.g., no branching paths). Don't exclude based on speed alone—look for corroborating evidence.

2. Straightlining Detection

What it is: Selecting the same response option across all items in a grid or scale battery.

Why it matters: Thoughtful responses vary based on question content. Identical responses across diverse items indicate satisficing—giving the same answer repeatedly without reading.

How to check:

  1. Identify grid questions or item batteries (5+ items on same scale)
  2. Calculate response variance per respondent within each grid
  3. Flag respondents with zero or near-zero variance

Example:

Respondent Item 1 Item 2 Item 3 Item 4 Item 5 Variance
A 4 3 5 4 3 Normal
B 4 4 4 4 4 Zero (flag)
C 1 1 1 1 1 Zero (flag)

What to do if you see this:

  • Review the items: are they genuinely diverse? Straightlining on similar items might be valid.
  • Check if straightliners also fail other quality checks
  • Consider excluding straightliners from analysis or running sensitivity analysis with/without them

Caution: Some respondents legitimately feel the same way about all items. Look for patterns across multiple grids, not just one.

3. Inconsistency Detection

What it is: Contradictory responses that can't both be true.

Why it matters: Inconsistent responses indicate the respondent wasn't paying attention, didn't understand the questions, or is responding randomly.

Types of inconsistency:

Logical inconsistency:

  • Reports no children but answers questions about children's ages
  • Reports never using a product but rates product features
  • Reports annual income of 30,000butmonthlyexpensesof30,000 but monthly expenses of 10,000

Scale inconsistency:

  • Rates overall satisfaction as "Very satisfied" but rates all components as "Dissatisfied"
  • Strongly agrees with "I love my job" and "I hate coming to work"

Temporal inconsistency:

  • Reports joining company in 2020 but having 10 years of tenure

How to check:

  1. Identify question pairs that should be logically related
  2. Define rules for inconsistency (e.g., if Q5 = "No" then Q6 should be blank)
  3. Flag responses that violate rules

What to do if you see this:

  • Minor inconsistencies: may be honest mistakes, consider keeping
  • Major inconsistencies: strong signal of low quality, consider excluding
  • Systematic patterns: may indicate question design problems, not respondent problems

4. Attention Check Failures

What it is: Failing questions specifically designed to verify attention.

Why it matters: Attention checks (also called trap questions or instructional manipulation checks) directly test whether respondents are reading. Failing them is strong evidence of disengagement.

Types of attention checks:

Instructional checks:

"To show you're paying attention, please select 'Strongly disagree' for this item."

Content checks:

In a list of activities: "Traveling to the moon for vacation" (should not be selected)

Recall checks:

"Earlier you indicated your department. What department was that?" (open-ended, compared to earlier closed response)

How to check:

  1. Include 1-2 attention checks in surveys over 5 minutes
  2. Flag all failures
  3. Review whether failures correlate with other quality issues

What to do if you see this:

  • Single failure: review other indicators before excluding
  • Multiple failures: strong case for exclusion
  • High failure rate overall: may indicate confusing instructions or fatiguing survey

Caution: Some attention checks are poorly designed and confuse legitimate respondents. Test your checks in pilots.

5. Open-Text Quality

What it is: Evaluating the quality of responses to open-ended questions.

Why it matters: Open-ended responses require more effort than clicking options. Low-quality open-text responses often indicate low-quality closed-ended responses too.

Red flags:

Red Flag Example Indicates
Gibberish "asdfasdf" or "jjjjjjj" No engagement
Single character "." or "n" Minimal effort
Copy-paste Identical text in multiple responses Bot or fraud
Off-topic Question about product, answer about weather Not reading
Profanity/abuse Hostile content Disengagement or fraud
Suspiciously perfect Reads like marketing copy Bot or fraud

How to check:

  1. Review open-text responses for red flags
  2. Calculate response length distribution; flag outliers
  3. Check for duplicate responses across respondents
  4. Use automated tools for gibberish detection if volume is high

What to do if you see this:

  • Gibberish/single character: strong signal for exclusion
  • Off-topic: may indicate question confusion; review before excluding
  • Duplicates: investigate for bot activity

6. Duplicate Detection

What it is: Multiple responses from the same person.

Why it matters: Duplicates inflate sample size and can skew results if the duplicate respondent is unusual.

How to check:

  1. Check for duplicate identifiers (email, user ID, etc.)
  2. Check for duplicate IP addresses (with caution—shared IPs are common)
  3. Check for near-identical response patterns
  4. Check for identical open-text responses

What to do if you see this:

  • Keep the first response, exclude subsequent ones
  • Or keep the most complete response if completion varies
  • Document your rule and apply consistently

Caution: Shared devices (households, offices) can create false duplicate signals. Look for multiple indicators, not just IP address.

7. Missing Data Patterns

What it is: Analyzing patterns in missing responses.

Why it matters: Random missing data is manageable. Systematic missing data (e.g., everyone skips the same question) indicates a problem with the question or survey design.

How to check:

  1. Calculate missing rate per question
  2. Calculate missing rate per respondent
  3. Look for patterns: do certain questions have high skip rates? Do certain respondents skip many questions?

What to do if you see this:

Pattern Likely Cause Action
High missing on one question Confusing or sensitive question Review question design
High missing late in survey Fatigue Consider shortening survey
High missing for some respondents Disengagement Review other quality indicators
Random missing Normal variation Standard missing data handling

8. Response Distribution Anomalies

What it is: Unusual patterns in how responses are distributed.

Why it matters: Extreme distributions can indicate question problems or data quality issues.

What to look for:

Ceiling/floor effects: Everyone selecting the highest or lowest option suggests the scale doesn't capture variation (or the question is leading).

Bimodal distributions: Two peaks might indicate two distinct populations—or a confusing question interpreted two ways.

Uniform distributions: Every option selected equally might indicate random responding.

How to check:

  1. Plot response distributions for key questions
  2. Compare to expected distributions based on theory or prior research
  3. Investigate anomalies

What to do if you see this:

  • Ceiling/floor: may be valid (everyone really is satisfied) or may indicate scale/wording problems
  • Bimodal: investigate whether subgroups differ or question is ambiguous
  • Uniform: check for random responding patterns

Building Your Quality Control Process

Step 1: Define Criteria Before Collecting Data

Decide your quality thresholds before you see the data. This prevents post-hoc rationalization.

Document:

  • Speeder threshold (e.g., <1/3 median time)
  • Straightlining threshold (e.g., zero variance across any grid of 5+ items)
  • Attention check policy (e.g., exclude if failed 2+ checks)
  • Open-text policy (e.g., exclude if gibberish in any open-text field)

Step 2: Apply Criteria Systematically

Run all checks on all responses. Don't cherry-pick which responses to scrutinize.

Create a quality flag for each criterion:

  • Speeder flag (0/1)
  • Straightliner flag (0/1)
  • Inconsistency flag (0/1)
  • Attention check flag (0/1)
  • Open-text flag (0/1)

Step 3: Review Flagged Responses

Automated flags catch candidates; human review makes final decisions.

For each flagged response:

  • Is this a clear quality failure or borderline?
  • Do multiple flags co-occur? (Stronger case for exclusion)
  • Would excluding this response change conclusions?

Step 4: Document Everything

Record:

  • How many responses were flagged on each criterion
  • How many were excluded and why
  • How exclusions affected sample size and composition
  • Whether conclusions differ with/without excluded responses

Step 5: Run Sensitivity Analysis

Analyze your data twice:

  1. With all responses
  2. With low-quality responses excluded

If conclusions differ substantially, report both. If they're similar, you can be more confident in your findings.

Quick Reference Checklist

Before analyzing survey data, verify:

Speeders

  • Calculated completion time distribution
  • Flagged responses below threshold
  • Reviewed flagged responses for corroborating issues

Straightliners

  • Calculated within-grid variance for each respondent
  • Flagged zero/near-zero variance responses
  • Checked whether straightlining occurs across multiple grids

Inconsistencies

  • Defined logical consistency rules
  • Flagged responses violating rules
  • Distinguished minor from major inconsistencies

Attention Checks

  • Reviewed attention check failure rate
  • Flagged all failures
  • Assessed whether failures correlate with other issues

Open-Text Quality

  • Reviewed open-text responses for red flags
  • Flagged gibberish, duplicates, off-topic responses
  • Checked for bot-like patterns

Duplicates

  • Checked for duplicate identifiers
  • Checked for duplicate response patterns
  • Applied consistent deduplication rule

Missing Data

  • Calculated missing rates by question and respondent
  • Investigated high-missing questions
  • Determined missing data handling approach

Documentation

  • Recorded all quality criteria and thresholds
  • Documented exclusion decisions
  • Ran sensitivity analysis

Building surveys with built-in quality controls?

Lensym includes automatic speeder detection, attention check support, and response validation to help you collect cleaner data from the start.

→ Get Early Access


Related Reading: