Founder Insight

How Customer Data Becomes the Context Window for AI Marketing Models

Kashish Gupta, Co-CEO & Co-Founder at Hightouch

Listen on TL;Listen Prefer to listen? Hear this article read aloud.

There’s a reason generic AI tools produce worse marketing outputs than purpose-built ones: they lack the customer data context that makes the difference between a plausible answer and a correct one.

Kashish Gupta, Co-CEO of Hightouch — the $1.2B composable customer data platform serving major B2C brands — made a claim during our conversation that reframes how we should think about AI in marketing. The competitive advantage isn’t the model. It’s the context you feed it.

The semantic layer as LLM context

Most companies using AI for marketing tasks face a translation problem. The LLM needs to generate SQL to query customer data, but it doesn’t understand what the database columns mean just from their names. cust_ltv_90d could mean anything without context.

Hightouch’s solution is a semantic layer — structured metadata describing what’s in the data warehouse, including table relationships, column meanings, and business logic. This layer doesn’t just help humans navigate the database. It functions as a context layer for the LLM.

“The LLM can know where the right queries are and where the right data is to use,” Kashish explained, describing how the semantic layer integration works alongside their hallucination reduction systems.

The distinction matters. Without semantic context, an LLM guessing at SQL is essentially performing pattern matching on column names and hoping for the best. With semantic context, the model understands that cust_ltv_90d means “customer lifetime value over the past 90 days, denominated in the customer’s local currency, calculated from completed transactions only.” That understanding is the difference between a query that looks right and a query that is right.

Two contexts that generic tools don’t have

Kashish described two specific context advantages that make Hightouch’s AI outputs structurally better than what a vanilla ChatGPT session could produce for the same marketing task.

The first is customer data context. Because Hightouch sits on top of the company’s data warehouse, the AI has access to actual customer behavior patterns — which segments respond to which campaigns, which product categories drive repeat purchases, which engagement patterns predict churn. Generic AI tools work from prompts; Hightouch’s AI works from data.

“We have all the customer data. So we can give you really good ideas for which campaign to send to which customer since we have seen all the data.”

The second is asset context. All existing brand-approved assets — from Figma, digital asset management systems, past campaign performance — are brought into embeddings. The AI doesn’t just know what’s in the database; it knows what creative materials exist, which ones have performed, and what the brand guidelines actually look like in practice.

“A vanilla ChatGPT generation doesn’t know what’s working in the market and doesn’t know anything about your customer data,” Kashish said. “Those two contexts that we have uniquely, I think will give us a huge advantage.”

Why this pattern extends beyond marketing

The customer-data-as-context pattern is relevant for anyone building AI applications on top of enterprise data. The principle: the model’s usefulness is bounded by the quality and specificity of the context it receives, not by its raw reasoning capability.

A smarter model with poor context will underperform a simpler model with rich, structured context. This is why Hightouch invested six to eight months building the infrastructure to provide that context — the semantic layer, the asset embeddings, the real-time customer behavior feeds — rather than switching to a more powerful base model.

The same principle applies to any AI application where the data is proprietary: legal document analysis, financial modeling, healthcare records, supply chain optimization. The moat isn’t the model. It’s the structured context pipeline that feeds it.

The feedback loop that compounds

There’s one more dimension that separates embedded context from prompt-based context: the feedback loop.

When Hightouch runs a campaign, the performance data flows back into the system. The AI learns which content variants worked for which segments, which channels drove conversions, which timing patterns produced engagement. This creates a flywheel where each campaign makes the context richer and the next campaign more effective.

This is the activation side that Kashish argues is the hard part of marketing AI — and why he sees infrastructure, not models, as the competitive moat. Any company can call an API. Building the context pipeline that makes each API call better than the last requires years of production data and purpose-built infrastructure.

FAQ

How does customer data improve AI marketing model accuracy?

Generic AI tools generate marketing content from prompts alone, missing customer behavior patterns and brand context. Purpose-built platforms like Hightouch feed the AI structured customer data — segment behavior, campaign performance, product engagement patterns — so recommendations are based on actual data, not guesses. The semantic layer translates database schema into meaning the LLM can use for accurate SQL generation.

What is a semantic layer in AI and why does it matter?

A semantic layer is structured metadata describing what’s in a data warehouse — table relationships, column meanings, business logic definitions. For AI applications, it functions as a context layer that enables accurate SQL generation. Without it, LLMs guess at database column meanings from names alone. With it, they understand the business logic behind each field, reducing hallucination in data queries.

How does Hightouch’s composable CDP connect to AI models?

Hightouch connects to a company’s existing cloud data warehouse and provides a semantic layer on top. This semantic metadata — describing tables, columns, and business relationships — feeds into AI models as structured context. The company’s customer data, campaign history, and brand assets all become available to the LLM without copying data out of the company’s own infrastructure.

Why do generic AI tools underperform for enterprise marketing?

Generic AI tools lack two critical contexts: customer data (which segments respond to which campaigns, what engagement patterns predict churn) and asset data (which creative materials exist, which have performed, what brand guidelines look like in practice). Without these, AI marketing outputs are based on pattern matching from training data rather than actual business intelligence.

What makes AI marketing context different from a regular prompt?

A prompt gives the model one-time instructions. Embedded context — semantic layers, customer behavior data, asset performance history — gives the model structured understanding that compounds over time. Each campaign’s performance feeds back into the system, making the next campaign’s context richer. This feedback loop is what separates embedded context from prompt engineering.

How do you reduce hallucination in AI-generated marketing queries?

Semantic layer integration lets the LLM understand database structure and business logic, reducing guesswork in SQL generation. A smaller verification LLM continuously monitors the primary model’s actions against system state. A multi-model evaluation framework cross-checks responses for consistency. Together, these three layers address different failure modes: misunderstanding, fabrication, and inconsistency.

What is the composable CDP approach to customer data?

The composable approach leaves customer data in the company’s own data warehouse — Snowflake, BigQuery, Databricks — and builds marketing tools on top. Traditional CDPs copied data into their own system, creating security concerns and duplicate databases. Composable CDPs provide segmentation, journey orchestration, and AI capabilities without data movement, keeping the company in control of privacy and governance.

Why is the context pipeline more important than the AI model for marketing?

A smarter model with poor context underperforms a simpler model with rich, structured context. The model’s usefulness is bounded by the specificity of what it receives — customer behavior data, semantic layer metadata, asset performance history. Building this context pipeline takes years of production data and purpose-built infrastructure, which is why it constitutes a more durable competitive advantage than model selection.

Full episode coming soon

This conversation with Kashish Gupta is on its way. Check out other episodes in the meantime.

Visit the Channel

More from Kashish Gupta

Related Insights