AI Simulates Survey Responses, But Accuracy Diverges from Public Opinion

Reporting by The Conversation (Ambuj Tewari) documents that large language models can be prompted to produce thousands of simulated survey answers, for example generating 10,000 responses by varying demographic prompts and sampling the model's internal randomness. Gallup says it has begun independent research into "simulated responses" through a partnership with Simile, aiming to validate where AI agents predict real people's answers and where they fail (Gallup methodology blog). Methodology reviews from NORC, Stanford GSB and the Nielsen Norman Group summarize benefits-automated coding, faster questionnaire design, and lower cost-and consistent caveats: simulated agents can reproduce trends but often miss real-world variance and introduce bias. Editorial analysis: For practitioners, synthetic respondents are a useful exploratory tool when cost or speed is critical, but they are not a drop-in replacement for probability-based surveys when population-level estimates are required.
What happened
The academic commentary piece by Ambuj Tewari in The Conversation reports that large language models (LLMs) can be prompted to emulate many different respondents by varying demographic instructions and using model randomness to generate multiple replies, enabling examples such as creating 10,000 simulated answers with repeated prompting. According to a Gallup methodology blog post, Gallup has launched research into simulated responses in partnership with Simile, using probability-based interview data to build agent banks and independently evaluate where those agents approximate human answers and where they fall short. The research-methods reviews published by NORC and a Stanford GSB report document practical uses of AI in surveys-automating coding of open-ended responses, drafting questionnaire language, and reducing analyst time-while the Nielsen Norman Group synthesis of multiple studies reports that digital twins and synthetic users can sometimes reproduce group-level trends but exhibit bias and reduced fidelity in variability.
Technical details
Editorial analysis: LLM-based simulated respondents work by turning respondent characteristics and interview-derived inputs into rich prompt contexts and then sampling the model multiple times, which produces diversity via stochastic generation. Industry reviews show two recurring technical patterns: when simulated agents are constructed from detailed interview data they match human responses more closely, and when agents are built from sparse demographic personas they tend to miss fine-grained nuance and within-group heterogeneity (NN/g and NNGroup synthesis). Another technical limitation reported across sources is sensitivity to prompt design and model versioning; The Conversation notes that different prompts, settings and model updates can materially change outputs, which complicates reproducibility.
Context and significance
Editorial analysis: For survey methodologists and applied researchers, simulated responses represent a new axis in the trade-off between cost, speed and statistical validity. The Gallup blog frames simulated agents as a complement to-not a replacement for-probability-based data collection, and Gallup states that its commitment to traditional, probability-based research remains intact. NORC's expert view highlights concrete productivity gains-automated coding and questionnaire drafting-that lower operational friction for handling open-ended data. At the same time, multiple sources report consistent concerns: simulated respondents can underrepresent variance, exhibit demographic-dependent bias, and may smooth or sanitize language in ways that remove authentic signal (Stanford GSB reporting on Prolific studies; NNGroup findings).
Evidence from empirical work
- •The Stanford GSB piece synthesizes research showing that some studies find digital twins can backfill missing answers and reproduce certain behavioral patterns, while other work finds synthetic users systematically misestimate effect sizes and within-group variance. The Stanford summary also cites a Prolific study in which nearly one-third of respondents reported using LLMs to assist with survey answers, raising contamination risks for crowdsourced panels.
- •The NNGroup review draws from three evaluation studies and reports that models trained or prompted with rich interview data perform best, but performance degrades for underrepresented demographic groups, producing biased predictions in some cases.
What to watch
Editorial analysis: Observers should follow four indicators to assess the method's maturity:
- •external validation results from probability-based panels such as Gallup's independent tests with Simile
- •replication benchmarks that compare agent-bank outputs against fresh probability samples
- •documentation standards for prompt templates and model versions to improve reproducibility
- •vendor and platform policies on disclosure when synthetic responses are used. Researchers should also monitor evidence about demographic bias and variance compression in synthetic outputs, since those flaws directly affect the validity of population estimates
Practical takeaways for practitioners
Editorial analysis: Use simulated respondents for rapid hypothesis generation, pretesting questionnaires, and augmenting qualitative coding workflows, while relying on probability-based sampling for official population estimates or high-stakes policy decisions. Where synthetic agents are used, teams should report model version, prompt templates, seeding/randomness parameters, and validation against held-out probability samples so readers can assess fit-for-purpose.
Scoring Rationale
The story is notable for research and survey-methodology practitioners because it documents emerging, validated uses of LLMs for simulated responses while reporting consistent limitations; it is not transformational for core modeling practice but affects how social-science data pipelines are designed.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems


