InstaPoll: A New Approach to Surveying Using Synthetic Data and Multi-Step Reasoning with LLMs

Arima
Feature image

Finding efficient ways to survey the market has always interested many practitioners. Whether you are a brand trying to figure out which messaging resonates more with your audience, a media outlet hoping to project the winner of the next election, or a policymaker wanting to know how the general public would respond to a certain change, polling is arguably the best way to collect public opinions. However, traditional polling is both expensive and slow.

Working with at the University of Southern California, we developed a new way of polling leveraging the Professor Yue Zhao Synthetic Society by Arima dataset and multi-step reasoning with LLMs. This novel approach, called InstaPoll, significantly reduces the effort required to conduct polling. As part of a new writing series, we would like to introduce the methodology, discuss when and how it should be used, and evaluate InstaPoll’s performance against recent public polls.

As a refresher, t he Synthetic Society by Arima is a privacy-by-design database that statistically mirrors real-world society, combining over 50,000 attributes across 325 million individuals in the US and 40 million individuals in Canada. Built with trusted data sources-including census, panels, mobility insights, and card transactions-it includes many attributes such as demographics, psychographics, behavioral and lifestyle patterns. National in scope and granular in detail, t he Synthetic Society enables modeling at the most sophisticated level.

The theoretical foundation of InstaPoll is based on, which was released on Arxiv in Nov 2024. In this paper, we introduce a multi-step reasoning framework designed for political analysis. The analysis starts with a random sample of approximately 10% of US adults from the Synthetic Society who are eligible to vote. For each synthetic voter, we prompt an LLM with their demographics, and candidate backgrounds and policies, and ask the LLM, which is now impersonating this voter, how they would have voted with the given information.

LLM Diagram

Three separate studies are done where the LLM receives different information each time. In the first version, the LLM only receives information about a person’s demographic background. This is shown in the box to the left named Demographic-Only Prompting. While simple and intuitive, this approach is limited by its inability to account for temporal changes in candidates’ political agendas and public opinion. Similarly, in the second version (middle box), the LLM only receives prompts about the candidate’s political profile and policy positions. While this version creates more dynamic and contextually grounded results, It introduces a bias in predictions because it doesn’t account for voting demographics, assuming that an entire state votes the same way.

Finally, both voter profiles and candidate profiles are supplied to the LLM. Furthermore, following the idea of Chain-of-Thought Prompting, instead of asking which candidate one will vote for right away, we first pose an intermediate question about their political position to “warm up” the respondent. Here are the two questions we asked:

Question 1: You are a person with [demographics]. The current year is [year]. [Two parties’ policy agenda]. When it comes to politics, how would you describe yourself?

Question 2: You are a persona with [demographics]. Your political position is [conservative-liberal spectrum from Question 1]. The current year is [year]. [Two parties’ policy agenda]. [Presidential candidates’ biographical and professional backgrounds]. Please answer the following question as if you were the respondent: As of today, will you vote for the Democratic Party (Joe Biden), the Republican Party (Donald Trump), or do you have no preference?

Results prediction

The 2020 American National Election Studies is used to validate InstaPoll, and the 21 states with the smallest voting margins are evaluated. Of the 21 states, Version 1 (LLM only receiving voter demographic data) predicted 16 states correctly, Version 2 (LLM only receiving candidate information) predicted 15 correctly, while including both information led to 19 correct predictions. AUC, a commonly used metric to evaluate model accuracy, is 0.90 in the final scenario.

Three Key Insights from Developing InstaPoll:

1. InstaPoll delivers comparable results at unprecedented speed.
Running a nationwide survey with nearly 30 million votes took less than a day-an impossible feat for traditional polling methods.

2. Synthetic Society data significantly enhances prediction accuracy. This advancement positions Arima at the forefront of developing next-generation polling technology.

3. LLMs unlock broader applications beyond polling. With the ability to analyze text, images, and videos, InstaPoll has potential use cases in ad evaluation, packaging design, and more.

InstaPoll marks a groundbreaking shift in how we approach polling and market research. By integrating synthetic data with the reasoning capabilities of large language models, it provides a faster, more cost-effective alternative to traditional methods-without sacrificing accuracy. Our initial tests, including the successful prediction of state outcomes in the 2020 U.S. election, demonstrate its potential to streamline polling while improving precision.

In the coming weeks, we’ll be testing InstaPoll against recent polling data and benchmarking its performance against traditional surveys. We believe InstaPoll isn’t just a new tool-it’s a glimpse into the future of data-driven decision-making, where speed, accuracy, and accessibility come together to deliver better insights, faster than ever before.

Get in touch with Winston@Arimadata.com for more information.

arima-logo

Copyright © 2025 Arima

ana-logo 4as-logo aca-logo cimm-logo cma-logo wfa-logo