Launching or growing a business demands deep consumer insights into unmet needs. Often these insights come from our personal observations and our market knowledge. To get past personal bias and query a national scale, marketers need surveys, studies, CRM files, web sites logs and more. How detailed these data are determines how useful and composable they are.
Consumer insights fall into two categories: aggregate data and personal data. 27% of Canadian households own a dog, that is aggregate data. Data can show what percentage of dog owners also own cats or take vacations, but it won’t reveal details about individual households or differences within the groups.
Collecting personal data offers much more detail and offers the ability to personalize communications based on user preferences. However, it is expensive to collect, and management requires governance of privacy. Marketers and media organizations can’t collect every data attribute for everyone in a national population. Many consumers opt out, records are hard to clean and de-duplicate, and no one provides a complete data set.
Marketers use both types of data recognizing the limitations and opportunities within consumer insights. Many organizations seek to enhance their consumer insights by matching individual records between databases to identify potential customers or address gaps in their data. However, this process is expensive. Along with technology costs, there are ongoing expenses for complying with privacy laws, and the risk of fines for data breaches.
As marketers deal with the limitations and costs of using aggregate and personal data, they wonder if there’s another way to gain individual-level consumer insights for the whole country. The answer is yes, if they know about synthetic data.
What is synthetic data?
Synthetic data is information created by algorithms, not from real-world events. It mimics the patterns and relationships found in real data but doesn’t include any personal details. It’s used to test and train models and now helps gather or support consumer insights.
How is synthetic data created?
Several techniques create synthetic data. One method, called The SynC method, follows these steps:
- Preprocess the source data to remove outliers.
- Use Gaussian copula models to capture the data’s patterns and relationships.
- Sample from these models to create detailed data subsets.
- Merge these subsets using neural networks.
- Scale the merged data to match the original data’s constraints.
This method makes detailed synthetic data from summarized real data while ensuring accuracy and protecting privacy.
What can you do with synthetic data?
There are many ways to use synthetic data in marketing and research. Some key use cases include:
- Augment their first-party data, uncovering insights about potential customers not present in their CRM systems.
- Gain a more comprehensive view of their target audience’s wants, motivations, and habits.
- Making informed decisions based on reliable consumer insights while ensuring privacy compliance.
- Test various strategies in a virtual environment, using synthetic data to simulate different market conditions.
- Enrich datasets with synthetic data to get a more complete view for analysis.
The future of consumer insights using synthetic data
As privacy laws tighten and traditional 1st party data collection becomes increasingly challenging with increasing user opt-outs, synthetic data will become an essential tool for marketers. In the years to come, we expect to see more businesses using it to gain consumer insights and find new ways to gather data.
To stay ahead in this evolving landscape, it’s crucial to begin learning about synthetic data now. Whether you’re a small business looking to compete with larger players or a marketer eager to harness the latest technology, synthetic data offers powerful solutions for enhancing consumer data insights.
If you’re interested in exploring specific examples or have any questions, contact us!
About the Author

Winston Li is the founder of Arima, a platform that enables marketers to build and manage their own media mix models combining data from multiple agencies, including any in-house services, to ensure continuity in their marketing measurement efforts. Prior to founding Arima, Winston was the Director of Data Science at PwC and Omnicom. Winston is also a part-time faculty member at Northeastern University Toronto and sits on the advisory board of the Master of Analytics program.
 
  
  
  
  
  
 