The Strengths and Weaknesses of Synthetic Data

5 March 2026

An evidence-based perspective for market research.

(Inspired by Ray Poynter, Understanding Synthetic Data, 2025, and related industry research)

Synthetic data is data generated by computational models to replicate patterns found in real-world research data. In market research, it is typically built from large volumes of historical respondent data to simulate likely responses under defined conditions.

By 2025, most major research organizations have introduced AI-augmented or synthetic-data-based solutions, driven by demand for faster and more scalable insights.¹

Across the industry, synthetic data is positioned not as a replacement for primary research, but as a complement. It is used to accelerate insight generation when traditional methods are constrained by time, cost, access, or feasibility.

Because consumers do not always reliably articulate future needs or behaviors, synthetic approaches combine fresh human data with modeled patterns from past behavior to explore scenarios quickly. These approaches still require validation against reality.²

Strengths of Synthetic Data

Speed and cost efficiency

Synthetic data can significantly reduce research timelines, enabling insights to be generated in hours rather than weeks. Publicly shared case examples from large organizations suggest that early-stage concept screening can be accelerated substantially when AI-driven generation and testing are used together.¹ This makes synthetic data valuable in fast-moving decision contexts where traditional research is too slow or costly.

Scalability and consistency

Synthetic systems can generate large volumes of responses instantly while maintaining consistent questionnaire structures across markets, time periods, or scenarios. This supports reliable comparison and modeling without the variability introduced by fieldwork conditions.

Access to hard-to-reach audiences

Synthetic audiences allow researchers to explore segments that are difficult, expensive, or impractical to recruit at scale, such as senior executives, niche B2B roles, or low-incidence populations. In these cases, synthetic data is most appropriate for exploratory insight and hypothesis development rather than final validation.

Privacy and ethical flexibility

When designed responsibly, synthetic data can preserve statistical patterns without exposing identifiable personal information. This can reduce privacy risks and support research in regulated or sensitive contexts, depending on how the system is built and governed.³

Directional accuracy for structured questions

Industry research indicates that well-calibrated synthetic data can approximate real-world results for many structured, factual, or behavior-based questions, particularly those grounded in past behavior.³ For decision-makers, this level of directional accuracy can be sufficient in early-stage exploration, provided results are validated against human data when decisions carry higher risk.

Weaknesses and Risks

Reduced emotional depth and empathy

Research indicates that synthetic respondents often capture core themes accurately but produce less emotional nuance, empathy, and narrative richness than human respondents. This limits their usefulness for understanding emotional drivers, lived experiences, and irrational consumer behavior, which are often critical to decision-making.

Bias toward central or rational responses

Synthetic data tends to reflect average or mainstream patterns, with extreme, polarized, or emotionally charged responses underrepresented.

Dependence on input data quality

Synthetic outputs are highly dependent on the quality, recency, and diversity of the data used to train the models. Outdated or biased source data will produce equally flawed results. Repeatedly generating synthetic data from synthetic inputs can also lead to drift, where outputs gradually diverge from real-world behavior.

Statistical and validation limitations

Synthetic data presents statistical and validation limitations because it is not based on probabilistic sampling, meaning traditional significance testing, margins of error and confidence intervals are not directly applicable. As a result, alternative validation approaches are required to assess reliability and fitness for purpose. Ongoing benchmarking against human data is essential to avoid overconfidence and to prevent results from being interpreted as more precise or predictive than they actually are.

Best Use Cases

- Boosting small or incomplete samples
  To supplement studies with demographic or behavioral gaps when sample sizes are limited.

- Early-stage concept and product screening
  To quickly identify promising ideas before investing in full-scale human research.

- Persona development and digital twins
  To test hypotheses and explore segment behaviors in a controlled, low-cost environment.

- Exploratory research with hard-to-reach audiences
  To generate directional insight when direct recruitment is impractical or prohibitively expensive.

- Scenario modeling and “what-if” analysis
  To rapidly explore alternative assumptions, market conditions, or strategic options.

The Value of Working With Leger

At Leger, our greatest advantage is our proprietary, our expertise, high-quality, and continuously active panel. This human foundation is what synthetic systems ultimately depend on.

By combining real respondent data with synthetic augmentation, we can offer clients insights that are:

- More accurate (grounded in real-world respondent data)

- More current (models refreshed continuously)

- More actionable (integrating both rational patterns and emotional context)

Synthetic data will not replace traditionnal research. It will amplify it.

With our panel depth and methodological expertise, Leger is ideally positioned to lead this artificial intelligence transformation, offering the most trustworthy, hybrid intelligence in the market. Discover our latest AI initiative with Smart Persona, a solution that enables real-time conversations with custom-built synthetic personas.

References

¹ ESOMAR, GreenBook, and public announcements from major research organizations (2023–2025), including Ipsos, Kantar, Toluna, and Fairgen, documenting the introduction of AI-augmented and synthetic-data-based research solutions.

² Kahneman, Daniel. Thinking, Fast and Slow. New York: Farrar, Straus and Giroux, 2011.
Supported by extensive behavioral science and market research literature on the limits of stated preferences.

³ Poynter, Ray. Understanding Synthetic Data. 2025.
Supplemented by methodological discussions and case presentations at IIEX, TMRE, and ESOMAR conferences.

Get the latest in your inbox

Stay up to date on cutting-edge research, news and more.

The Strengths and Weaknesses of Synthetic Data

Strengths of Synthetic Data

Speed and cost efficiency

Scalability and consistency

Access to hard-to-reach audiences

Privacy and ethical flexibility

Directional accuracy for structured questions

Weaknesses and Risks

Reduced emotional depth and empathy

Bias toward central or rational responses

Dependence on input data quality

Statistical and validation limitations

Best Use Cases

The Value of Working With Leger

References

The Pollster polled: Jean-Marc Leger

Beyond Value: How Canadians Are Redefining What Is “Worth It” in Everyday Purchases

Governing for Trust: What Canadians are Telling Financial Regulators and Institutions Right Now

Get the latest in your inbox

About Us

Services

Industry Expertise

Insights

Follow

Our Offices