How to Determine Sample Size for Surveys: A Statistical Guide

Sample size determination is one of the most misunderstood aspects of survey research. Most researchers reach for a calculator, plug in numbers, and trust the output. But without understanding the underlying statistics, you can't know when the formula applies—and when it doesn't.

This guide explains the mathematics behind sample size calculation, so you can make informed decisions rather than just trusting a black box.

The Two Types of Sample Size Problems

Before diving into formulas, understand that there are two fundamentally different sample size problems:

1. Estimation Problems

"What percentage of customers are satisfied?"

You're estimating a single proportion or mean. You want a confidence interval around that estimate. Cochran's formula applies here.

2. Comparison Problems

"Is satisfaction higher among customers who received support vs. those who didn't?"

You're testing a hypothesis. You need enough power to detect an effect if it exists. Power analysis applies here.

Most online calculators only handle the first case. If you're comparing groups, those calculators will mislead you.

Cochran's Formula Explained

For estimation problems with categorical outcomes, the standard formula is:

n₀ = (Z² × p × (1-p)) / e²

Let's break down each component:

Z-Score (Confidence Level)

The Z-score corresponds to how confident you want to be that your true value falls within your margin of error:

Confidence Level	Z-Score	Interpretation
90%	1.645	10% chance the true value is outside your interval
95%	1.960	5% chance (standard for most research)
99%	2.576	1% chance (conservative, requires larger samples)

Why 95% is standard: It balances precision against practicality. Going to 99% confidence requires roughly 1.7x the sample size for the same margin of error.

Expected Proportion (p)

This is your best guess at what the survey will find. If you expect 70% of respondents to say "yes," then p = 0.70.

The conservative choice: If you have no prior estimate, use p = 0.50. This maximizes sample size because p × (1-p) reaches its maximum at 0.5.

p = 0.50: p × (1-p) = 0.25 (maximum variance)
p = 0.70: p × (1-p) = 0.21 (lower variance)
p = 0.90: p × (1-p) = 0.09 (much lower variance)

This matters: a survey expecting 90/10 split needs far fewer responses than one expecting 50/50.

Margin of Error (e)

How much imprecision you'll accept. A margin of error of ±5% means if your survey finds 60% satisfaction, the true value is likely between 55% and 65%.

Typical values:

±3%: High precision, requires large samples
±5%: Standard for most research
±10%: Acceptable for exploratory work

Finite Population Correction

Cochran's base formula assumes an infinite population. When your population is finite (and you're sampling a meaningful fraction of it), you need less sample:

n = n₀ / (1 + ((n₀ - 1) / N))

Where N is your total population size.

When it matters:

Population	Sample (no FPC)	Sample (with FPC)	Difference
100,000	385	383	Negligible
10,000	385	370	Small
1,000	385	278	Significant
500	385	218	Large

Rule of thumb: If you're sampling less than 5% of the population, FPC barely matters. If you're sampling more than 10%, always apply it.

The Assumptions You're Making

When you use Cochran's formula, you're implicitly assuming:

1. Simple Random Sampling

Every member of the population has an equal probability of selection. If you're using convenience sampling, quota sampling, or snowball sampling, the formula's precision claims don't hold.

2. Binary or Categorical Outcome

The formula is designed for proportions. For continuous variables (like satisfaction on a 1-10 scale), you'd use a different formula involving standard deviation.

3. No Non-Response Bias

The formula calculates how many responses you need, not how many invitations to send. If non-responders differ systematically from responders, your confidence interval is precise but wrong.

4. Independent Observations

Each response is independent. If you survey employees within teams, and team culture affects responses, you may need to account for clustering.

When Standard Formulas Don't Apply

Group Comparisons (Need Power Analysis)

If you want to compare two groups, Cochran's formula is wrong. You need power analysis, which accounts for:

Effect size: How big a difference do you want to detect?
Power: What probability of detecting a real effect? (typically 80%)
Alpha: What false positive rate? (typically 5%)

A sample size calculator might say 385 is enough. But if you're comparing two groups of 192 each, you may only have 40% power to detect a medium effect. You'd miss real differences more often than you'd find them.

Complex Sampling Designs

Stratified sampling: You sample proportionally from subgroups. Formulas differ based on allocation method.

Cluster sampling: You sample groups (schools, companies), then individuals within groups. Design effects can double or triple required sample size.

Multi-stage sampling: Combinations of the above. Consult a statistician.

Small Expected Proportions

When p < 0.10 or p > 0.90, the normal approximation underlying Cochran's formula becomes less accurate. For rare events, consider exact binomial methods.

Response Rate vs. Sample Size

A critical distinction many researchers miss:

Sample size (n): How many completed responses you need for your desired precision.

Required invitations: How many people you need to contact, given expected response rate.

Required invitations = n / expected_response_rate

If you need 400 responses and expect a 20% response rate, you need to invite 2,000 people.

But there's a deeper issue: non-response bias. If the 20% who respond differ from the 80% who don't, your sample isn't representative regardless of size. A larger biased sample is just a more precise wrong answer.

This is why response rate alone isn't a quality indicator—what matters is whether non-responders differ on the variables you're measuring.

Practical Decision Framework

Step 1: Define Your Research Question

Are you estimating a single proportion/mean? → Cochran's formula
Are you comparing groups or testing effects? → Power analysis

Step 2: Determine Acceptable Precision

What margin of error can you tolerate?
What confidence level does your field expect?

Step 3: Estimate Key Parameters

What's your expected proportion? (Use 0.5 if unknown)
What's your population size?

Step 4: Calculate and Adjust

Apply finite population correction if needed
Account for expected response rate
Build in buffer for exclusions

Step 5: Sanity Check

Is the sample achievable given your resources?
Do you have access to enough of the population?
If not, reconsider your precision requirements

Common Mistakes

Mistake 1: Using the Calculator for Comparisons

Cochran's formula tells you nothing about statistical power. If you're testing hypotheses, you need different tools.

Mistake 2: Ignoring Non-Response

385 responses from a 10% response rate is not the same as 385 from an 80% rate. The first is almost certainly biased.

Mistake 3: Over-Precision

Do you really need ±3%? The difference between ±3% and ±5% can double your sample size. Make sure the precision is worth the cost.

Mistake 4: Forgetting Subgroups

If you plan to analyze subgroups separately, each subgroup needs adequate sample size. A survey of 400 with 10% from a key demographic gives you only 40 in that subgroup—too few for reliable analysis.

Calculate Your Sample Size

Ready to apply these concepts? Use our Sample Size Calculator to get your numbers. It uses Cochran's formula with finite population correction and shows all intermediate calculations.

For comparison problems requiring power analysis, we recommend G*Power (free, academic-grade).