Survey Data Quality: A Practical Checklist Before You Analyze

The most dangerous data isn't missing data—it's bad data that looks complete.

Survey data quality problems don't announce themselves. Straightliners, speeders, and inattentive respondents submit responses that look like everyone else's. Their data passes basic checks, gets included in your analysis, and quietly distorts your conclusions.

This checklist helps you identify quality issues before analysis. Think of it as quality control for survey data: a systematic review that catches problems human reviewers miss.

TL;DR:

Check for speeders: Responses completed impossibly fast indicate no engagement.
Check for straightliners: Identical responses across grid questions suggest satisficing.
Check for inconsistent responses: Contradictory answers indicate inattention or confusion.
Check attention check failures: Failed trap questions confirm disengagement.
Check open-text quality: Gibberish, copy-paste, or off-topic responses signal problems.
Document your decisions: Whatever you exclude, document why and how it affects results.

→ Build Quality-First Surveys with Lensym

Why Data Quality Checks Matter

Every dataset contains some low-quality responses. The question is whether they're random noise (which averages out) or systematic contamination (which biases results).

Low-quality responses typically come from:

Satisficers: Respondents giving minimal effort to finish quickly
Bots: Automated responses in online panels
Fraudulent respondents: People gaming incentive systems
Confused respondents: People who misunderstood questions or instructions
Fatigued respondents: People who started engaged but gave up mentally

Without quality checks, these responses are weighted equally with thoughtful ones. A dataset with a substantial proportion of low-quality responses will produce conclusions contaminated by that noise—the more careless responses, the more distorted your findings.

The Data Quality Checklist

1. Speeder Detection

What it is: Respondents who complete the survey faster than reasonably possible.

Why it matters: Reading and considering questions takes time. Someone who finishes a 10-minute survey in 2 minutes didn't read the questions.

How to check:

Calculate median completion time for your survey
Flag responses below a threshold (common: 1/3 of median time, or 2 standard deviations below mean)
Review flagged responses for other quality indicators

Thresholds:

Survey Length	Minimum Reasonable Time
5 minutes	1.5-2 minutes
10 minutes	3-4 minutes
15 minutes	5-6 minutes
20 minutes	7-8 minutes

What to do if you see this:

Exclude extreme speeders (below minimum reasonable time)
Flag moderate speeders for additional review
Consider whether speeding correlates with other quality issues

Caution: Some legitimate respondents are fast readers or have simple situations (e.g., no branching paths). Don't exclude based on speed alone—look for corroborating evidence.

2. Straightlining Detection

What it is: Selecting the same response option across all items in a grid or scale battery.

Why it matters: Thoughtful responses vary based on question content. Identical responses across diverse items indicate satisficing—giving the same answer repeatedly without reading.

How to check:

Identify grid questions or item batteries (5+ items on same scale)
Calculate response variance per respondent within each grid
Flag respondents with zero or near-zero variance

Example:

Respondent	Item 1	Item 2	Item 3	Item 4	Item 5	Variance
A	4	3	5	4	3	Normal
B	4	4	4	4	4	Zero (flag)
C	1	1	1	1	1	Zero (flag)

What to do if you see this:

Review the items: are they genuinely diverse? Straightlining on similar items might be valid.
Check if straightliners also fail other quality checks
Consider excluding straightliners from analysis or running sensitivity analysis with/without them

Caution: Some respondents legitimately feel the same way about all items. Look for patterns across multiple grids, not just one.

3. Inconsistency Detection

What it is: Contradictory responses that can't both be true.

Why it matters: Inconsistent responses indicate the respondent wasn't paying attention, didn't understand the questions, or is responding randomly.

Types of inconsistency:

Logical inconsistency:

Reports no children but answers questions about children's ages
Reports never using a product but rates product features
Reports annual income of $30,000 but monthly expenses of$ 10,000

Scale inconsistency:

Rates overall satisfaction as "Very satisfied" but rates all components as "Dissatisfied"
Strongly agrees with "I love my job" and "I hate coming to work"

Temporal inconsistency:

Reports joining company in 2020 but having 10 years of tenure

How to check:

Identify question pairs that should be logically related
Define rules for inconsistency (e.g., if Q5 = "No" then Q6 should be blank)
Flag responses that violate rules

What to do if you see this:

Minor inconsistencies: may be honest mistakes, consider keeping
Major inconsistencies: strong signal of low quality, consider excluding
Systematic patterns: may indicate question design problems, not respondent problems

4. Attention Check Failures

What it is: Failing questions specifically designed to verify attention.

Why it matters: Attention checks (also called trap questions or instructional manipulation checks) directly test whether respondents are reading. Failing them is strong evidence of disengagement.

Types of attention checks:

Instructional checks:

"To show you're paying attention, please select 'Strongly disagree' for this item."

Content checks:

In a list of activities: "Traveling to the moon for vacation" (should not be selected)

Recall checks:

"Earlier you indicated your department. What department was that?" (open-ended, compared to earlier closed response)

How to check:

Include 1-2 attention checks in surveys over 5 minutes
Flag all failures
Review whether failures correlate with other quality issues

What to do if you see this:

Single failure: review other indicators before excluding
Multiple failures: strong case for exclusion
High failure rate overall: may indicate confusing instructions or fatiguing survey

Caution: Some attention checks are poorly designed and confuse legitimate respondents. Test your checks in pilots.

5. Open-Text Quality

What it is: Evaluating the quality of responses to open-ended questions.

Why it matters: Open-ended responses require more effort than clicking options. Low-quality open-text responses often indicate low-quality closed-ended responses too.

Red flags:

Red Flag	Example	Indicates
Gibberish	"asdfasdf" or "jjjjjjj"	No engagement
Single character	"." or "n"	Minimal effort
Copy-paste	Identical text in multiple responses	Bot or fraud
Off-topic	Question about product, answer about weather	Not reading
Profanity/abuse	Hostile content	Disengagement or fraud
Suspiciously perfect	Reads like marketing copy	Bot or fraud

How to check:

Review open-text responses for red flags
Calculate response length distribution; flag outliers
Check for duplicate responses across respondents
Use automated tools for gibberish detection if volume is high

What to do if you see this:

Gibberish/single character: strong signal for exclusion
Off-topic: may indicate question confusion; review before excluding
Duplicates: investigate for bot activity

6. Duplicate Detection

What it is: Multiple responses from the same person.

Why it matters: Duplicates inflate sample size and can skew results if the duplicate respondent is unusual.

How to check:

Check for duplicate identifiers (email, user ID, etc.)
Check for duplicate IP addresses (with caution—shared IPs are common)
Check for near-identical response patterns
Check for identical open-text responses

What to do if you see this:

Keep the first response, exclude subsequent ones
Or keep the most complete response if completion varies
Document your rule and apply consistently

Caution: Shared devices (households, offices) can create false duplicate signals. Look for multiple indicators, not just IP address.

7. Missing Data Patterns

What it is: Analyzing patterns in missing responses.

Why it matters: Random missing data is manageable. Systematic missing data (e.g., everyone skips the same question) indicates a problem with the question or survey design.

How to check:

Calculate missing rate per question
Calculate missing rate per respondent
Look for patterns: do certain questions have high skip rates? Do certain respondents skip many questions?

What to do if you see this:

Pattern	Likely Cause	Action
High missing on one question	Confusing or sensitive question	Review question design
High missing late in survey	Fatigue	Consider shortening survey
High missing for some respondents	Disengagement	Review other quality indicators
Random missing	Normal variation	Standard missing data handling

8. Response Distribution Anomalies

What it is: Unusual patterns in how responses are distributed.

Why it matters: Extreme distributions can indicate question problems or data quality issues.

What to look for:

Ceiling/floor effects: Everyone selecting the highest or lowest option suggests the scale doesn't capture variation (or the question is leading).

Bimodal distributions: Two peaks might indicate two distinct populations—or a confusing question interpreted two ways.

Uniform distributions: Every option selected equally might indicate random responding.

How to check:

Plot response distributions for key questions
Compare to expected distributions based on theory or prior research
Investigate anomalies

What to do if you see this:

Ceiling/floor: may be valid (everyone really is satisfied) or may indicate scale/wording problems
Bimodal: investigate whether subgroups differ or question is ambiguous
Uniform: check for random responding patterns

Building Your Quality Control Process

Step 1: Define Criteria Before Collecting Data

Decide your quality thresholds before you see the data. This prevents post-hoc rationalization.

Document:

Speeder threshold (e.g., <1/3 median time)
Straightlining threshold (e.g., zero variance across any grid of 5+ items)
Attention check policy (e.g., exclude if failed 2+ checks)
Open-text policy (e.g., exclude if gibberish in any open-text field)

Step 2: Apply Criteria Systematically

Run all checks on all responses. Don't cherry-pick which responses to scrutinize.

Create a quality flag for each criterion:

Speeder flag (0/1)
Straightliner flag (0/1)
Inconsistency flag (0/1)
Attention check flag (0/1)
Open-text flag (0/1)

Step 3: Review Flagged Responses

Automated flags catch candidates; human review makes final decisions.

For each flagged response:

Is this a clear quality failure or borderline?
Do multiple flags co-occur? (Stronger case for exclusion)
Would excluding this response change conclusions?

Step 4: Document Everything

Record:

How many responses were flagged on each criterion
How many were excluded and why
How exclusions affected sample size and composition
Whether conclusions differ with/without excluded responses

Step 5: Run Sensitivity Analysis

Analyze your data twice:

With all responses
With low-quality responses excluded

If conclusions differ substantially, report both. If they're similar, you can be more confident in your findings.

Quick Reference Checklist

Before analyzing survey data, verify:

Speeders

Calculated completion time distribution
Flagged responses below threshold
Reviewed flagged responses for corroborating issues

Straightliners

Calculated within-grid variance for each respondent
Flagged zero/near-zero variance responses
Checked whether straightlining occurs across multiple grids

Inconsistencies

Defined logical consistency rules
Flagged responses violating rules
Distinguished minor from major inconsistencies

Attention Checks

Reviewed attention check failure rate
Flagged all failures
Assessed whether failures correlate with other issues

Open-Text Quality

Reviewed open-text responses for red flags
Flagged gibberish, duplicates, off-topic responses
Checked for bot-like patterns

Duplicates

Checked for duplicate identifiers
Checked for duplicate response patterns
Applied consistent deduplication rule

Missing Data

Calculated missing rates by question and respondent
Investigated high-missing questions
Determined missing data handling approach

Documentation

Recorded all quality criteria and thresholds
Documented exclusion decisions
Ran sensitivity analysis

Building surveys with built-in quality controls?

Lensym includes automatic speeder detection, attention check support, and response validation to help you collect cleaner data from the start.

→ Get Early Access

Related Reading:

Survey Data Quality: A Practical Checklist Before You Analyze

Why Data Quality Checks Matter

The Data Quality Checklist

1. Speeder Detection

2. Straightlining Detection

3. Inconsistency Detection

4. Attention Check Failures

5. Open-Text Quality

6. Duplicate Detection

7. Missing Data Patterns

8. Response Distribution Anomalies

Building Your Quality Control Process

Step 1: Define Criteria Before Collecting Data

Step 2: Apply Criteria Systematically

Step 3: Review Flagged Responses

Step 4: Document Everything

Step 5: Run Sensitivity Analysis

Quick Reference Checklist

Continue Reading

Anonymous Surveys and GDPR: What Researchers Must Document

Construct Validity in Surveys: From Theory to Measurement

Double-Barreled Questions: Why They Destroy Measurement Validity