At a recent conference, someone asked me what I do for work. I explained that I’m a data scientist and the founder of a company called Arima, where we help businesses make sense of their data to improve their marketing strategies. After I described the Synthetic Society, one of them said, “Oh, so it’s essentially a Large Population Model.” I thought that was a wonderful description, especially given that Large Language Models (LLMs) have come to the forefront recently.
How Large Language Models Work
Most people, whether technical or not, have heard of Large Language Models. In a nutshell, an LLM completes sentences. It takes a sequence of words and predicts the next word based on that sequence. (Technically, LLMs work with tokens not words, but for our purposes here, we'll use words).
For example, if a well-trained LLM is fed with 3 words, How are you, it should know that “today” is a possible next word. “This” could also be a good choice, since the sentence “How are you this morning” is grammatically correct and commonly used. Symbols are also part of our language so the LLM may also return a question mark “?” because “How are you?” is a complete sentence. When the LLM sees more than one possible continuation, it will randomly pick something based on the probability of it occurring. After the user chooses the right option, the sentence becomes part of learning the LLM and the process is repeated.
An example of how LLMs work
The Connection to Large Population Models
Large Language Models work similarly to Arima’s Large Population Model, which builds the Synthetic Society used in our Population Simulators. It essentially functions as an attribute completer where known attributes of people are fed into the model and the unknown ones are returned. For an LLM, you input words and expect the LLM to complete your sentence. In our Synthetic Society, we feed someone’s known attributes (demographic, behavioral, attitudinal) and expect the model to complete the unknown ones.
Large Population Models use known attributes to predict and fill in unknown ones.
Real-World Applications of Population Simulators
For example, a user of the Population Simulator could be a government official looking to decide where to build schools. The known inputs are the shortlisted neighborhoods, and the Population Simulator returns attributes such as the number of families with kids, number of kids, parents’ occupations, ethnicity, income and attitudes about the future. All this would help our users decide how many teachers are required and whether special needs like after-school classes, ESL or free lunches should be offered.
Another use case could be a financial institution evaluating whether a new customer is a high-risk driver. In this case, the financial institution can feed what they know about the customer based on their application (e.g., location, age, gender, credit score and products owned) and ask the Population Simulator to return attributes like driving habits, psychographic traits, method of commute, activities and hobbies to augment their risk profile.
Future Developments in Large Population Models
While all these ideas are relatively new, I see a clear path in how Large Population Models can develop and how businesses can utilize them in the coming years. Here are my three predictions for Large Population Models:
- Like LLMs, Large Population Models will get larger and larger, i.e., trained by enough attributes on enough people across enough countries over a long enough timespan, to eventually become the universal knowledge center of all consumer insights.
- Large Population Models will understand subtle differences across many dimensions, such as the differences between the same person today and two years ago, or between two individuals who, even though they live in adjacent neighborhoods, have vastly different behaviors because they perform slightly different jobs, or drive different makes of cars.
- Large Population Models will be able to make inferences based on known features similar to how LLMs today create new sentences based on learned grammatical rules and vocabulary. For example, if someone runs a roofing business and owns a vehicle, the Large Population Model can deduce that they likely own a pickup truck, even if the car type isn’t an existing attribute.
Conclusion: The Growing Role of Large Population Models
Large Population Models, such as those used to create Arima’s Synthetic Society, will become increasingly common and more effective in the coming years. Cutting-edge organizations are currently finding new use cases for Large Population Models that save money, increase efficiency, and boost revenue.
To learn more, please contact me at winston@arimadata.com.
Interested in learning more? Check out these related articles
Forecasting Retail Visits with Dynamic Huff's Gravity Model