April 8, 2025 Rui Mendes

Personalization at Scale: The Infrastructure Question Nobody Is Asking

Every marketing technology company selling to enterprise buyers is claiming personalization. The word has lost most of its meaning. When I'm in technical diligence conversations, I've started asking founders to be precise: what does "personalization" mean in your system at a mechanical level? What data goes in, what computation happens, and what specifically changes in the output for different users or segments? The answers to those questions vary enormously — and the variation reveals who is building something durable and who is building a demo.

The personalization problem, when examined carefully, turns out to be primarily an infrastructure problem. The model challenge — generating text or images that vary based on audience signals — is largely solved by current foundation models. What's not solved, and what very few companies are seriously addressing, is the infrastructure challenge: how do you operate a system that generates millions of personalized content variations at acceptable cost, latency, and quality consistency? That's the question I want to explore here.

The three layers of the personalization stack

A production personalization system has at least three distinct layers, each with its own technical challenges.

The first is audience signal processing — taking raw customer data (behavioral, transactional, contextual) and translating it into inputs the generation system can use. This sounds simple. It isn't. The input data is messy, sparse, and often privacy-constrained in ways that require thoughtful design. European GDPR requirements in particular create real constraints on how behavioral data can be collected, stored, and used for personalization — constraints that need to be designed into the data architecture from the start rather than retrofitted. Companies that treat signal processing as a solved problem because they have a customer data platform often discover that the CDP outputs aren't structured or clean enough to drive reliable personalization.

The second layer is content generation and quality control. Given a set of audience signals, generate content that is relevant, on-brand, and high-quality. This is where most of the attention goes, and for good reason — it's the visible part. But it's actually not the hardest part if you've solved the signal processing layer. The main challenges here are consistency (does the tone and brand voice hold at volume?), evaluation (how do you know when the personalization is actually helping versus just varying?), and latency (can you generate in time for the content to be useful?).

The third layer is orchestration and distribution. Getting the right personalized content to the right audience at the right time, tracked in a way that allows measurement. This is where most personalization systems silently fail. The content is generated, but it doesn't reach the audience through the right channel at the right moment. Or it reaches them but the measurement system can't attribute performance back to the personalization variable. Orchestration failure is invisible in demos and only appears in production at scale.

The cost problem that most pitches don't acknowledge

Generating content at scale with frontier language models is expensive. The cost-per-generation depends on model size, input and output token length, and inference infrastructure choices. At demo scale — hundreds or low thousands of variations — the cost is often negligible. At production scale — hundreds of thousands or millions of variations for a large enterprise — it becomes a fundamental unit economics problem.

Many AI personalization companies have not done this math carefully. They've priced their product based on demo-scale economics and are selling into enterprises that will eventually try to scale to production volumes. When the cost math becomes clear, three things can happen: the company discovers its margins collapse at scale; the customer discovers the tool is prohibitively expensive at the volume they need; or the company has to significantly degrade personalization quality to hit acceptable unit economics by using smaller, cheaper models.

The founders who have thought carefully about this have specific architectural answers. They use tiered model strategies — frontier models for high-value, low-volume personalizations (VIP customer communications, key account materials) and smaller, fine-tuned models for high-volume, lower-complexity variations (mass email subject lines, ad copy variants). They've done the inference cost modeling at production volumes before signing their first enterprise contract. They can tell you their cost per thousand variations at three different volume tiers. These are the conversations that separate infrastructure-serious founders from founders who have a good demo.

Why the data flywheel is the actual product

The most durable personalization businesses we've evaluated share a specific characteristic: the product gets measurably better as it processes more customer data. This sounds obvious but is rarer than it sounds. A product built on top of an off-the-shelf language model with minimal customization doesn't get better over time — the foundation model updates independently of the customer's data, and the customer could switch to any other wrapper on the same model.

A product with genuine data flywheel dynamics is different. Consider a personalization system that fine-tunes segment-specific models on the customer's performance data — what content drove conversion for which audience segments, what brand voice adjustments correlated with engagement, which personalization variables actually moved behavior versus which ones were noise. After six months of deployment, that system has a calibrated understanding of how that specific customer's audience responds that no new entrant can replicate. The customer switching cost isn't just the migration burden — it's the loss of two years of accumulated model calibration.

Building this kind of data flywheel requires specific architectural choices that aren't compatible with a "wrapper on GPT" approach. It requires storing and structuring performance data in a way that feeds back into the generation system, running the right fine-tuning and evaluation cycles, and building the product roadmap around the premise that the model is the differentiated asset. We look for evidence of this thinking early in technical conversations.

What this means for the personalization market over the next three years

The personalization market is going to bifurcate. The products that are primarily selling "AI personalization" as a feature — content that looks different for different users, driven primarily by simple rule-based segmentation with an LLM in the middle — will commoditize. The feature gap between these products will narrow, and the ones without strong distribution advantages will struggle to maintain pricing power.

The products that are genuinely solving the infrastructure challenges — signal processing at scale, cost-efficient generation architectures, measurement and orchestration loops, and data flywheel design — will build durable competitive positions. These are harder products to sell because the differentiation is in the architecture rather than the demo. They take longer to close enterprise deals. But they produce the kind of compounding value that makes customers deeply reluctant to leave.

The founders building in the second category are, in my experience, the ones who have operated production ML systems before founding a company. They've already been surprised by the cost problem. They've already learned the orchestration failure mode. They've already built and regretted a system without a data flywheel. That operating experience is exactly what we look for when we're evaluating technical founders in the personalization space — not because background is a proxy for talent, but because it's a proxy for which problems they've already solved in their head before we met.

Back to Notes