Why AI Models Are Becoming a Commodity (And What to Build Instead)
Coco Mao, CEO & Co-founder at OpenArt
The conventional wisdom in AI is that you have to own the model. Every $50M+ funding round seems to assume that defensibility lives in the weights — that whoever trains the best model wins the category. Coco Mao, CEO of OpenArt, ran the experiment and found the opposite. Her 10-person company has 6-7 million monthly active users and over $20M ARR. She doesn’t own a single model.
“We don’t actually own any of the models, because training model is very source intensive,” she said in our conversation. “And I also feel like models are becoming a commodity. For our team, we have more background building products. We didn’t have background like training models, so we to some extent were forced not to train the model. But then in retrospect, it was the really right choice.”
“In retrospect, it was the really right choice.” That’s a strong claim. Most AI founders treat model ownership as a defensibility moat. Coco treats it as a misallocation of capital that the constraint of being a small team accidentally saved her from.
What “Models Are a Commodity” Actually Means
A commodity isn’t a thing that’s bad. It’s a thing where the differentiation lives elsewhere. Wheat is a commodity — it doesn’t matter much whose farm it came from. Gasoline is a commodity — you don’t pick a station based on the molecular composition of the fuel. The commodity layer is necessary but not sufficient for value creation.
Coco’s claim is that AI models — at least at the application layer she operates in — are heading there. Several signals point that way. The number of competitive image and video models has grown from a handful in 2022 (DALL·E, Stable Diffusion, Midjourney) to dozens, with new ones launching monthly. Capabilities are converging. The cost of training has dropped, and open-source releases have made strong models available without licensing. The differentiation between a state-of-the-art image model and a one-year-old open-source model is shrinking, not growing, in terms of what end users can actually feel.
This doesn’t mean model training is over. There will always be frontier players pushing the bleeding edge. But for the vast majority of AI applications — especially in image, video, and creation tools — the model layer is increasingly available, increasingly substitutable, and increasingly not where defensibility lives.
Where Defensibility Actually Lives
If you don’t own the model, what do you own? Coco’s answer, implicit in OpenArt’s architecture, is that defensibility lives in three places:
The application workflow. OpenArt’s video generation isn’t a single model call. It’s a multi-step pipeline that breaks a script into scenes, generates each scene as a clip with consistency, lets users regenerate any individual clip, and assembles the final video. That workflow is the product. Users can’t replicate it just by hitting a model API directly. Coco articulates the design philosophy: “If you think about how a film is generated, people don’t shoot a film from start to end. They shoot clips and then put everything together. So it’s the same thing in order to tell a really good story.”
The user experience layer. OpenArt’s “one-click story” works because most users can’t write good prompts, don’t know what makes a story arc compelling, and don’t want to spend an hour configuring a video. The product abstracts away the complexity. The Sunday school teachers in Nashville and the 70-year-old grandma in New Zealand aren’t successful OpenArt users because they bought access to a state-of-the-art model. They’re successful because OpenArt translates their rough idea into a useful output without asking them to learn anything new.
The category position. Coco’s most strategic move was rejecting “AI image generation” as her category and reframing the company around “visual storytelling.” That category move is defensible because category positioning is hard to copy. A competitor can ship the same features. They can use the same models. But repositioning a company around a different category requires a different team, different roadmap, different marketing. It’s not a quick copy.
The pattern across all three: the moat is in the layers above the model, not in the model itself.
The Capital Implication
If models are a commodity, then the capital structure of AI startups should reflect that. Most don’t.
The default playbook is: raise $50M+, hire ML researchers, train custom models, fight for marginal gains over the open-source baseline. The math only works if model differentiation is a real moat. If it isn’t — if the open-source baseline catches up within months — that capital was misallocated.
OpenArt’s playbook is the opposite: stay small, stay capital-efficient, and invest the dollars you would have spent on model training into product, distribution, and user research. The company is profitable. Coco was direct that this gives her flexibility on whether and when to fundraise: “We have been profitable for a while, and it does give us a lot of flexibility of when we fundraise, whether we wanna fundraise.”
This isn’t ideological lean philosophy. Coco rejects “lean” as a goal in itself. The argument is structural: if the moat isn’t where the capital is being spent, you’re burning runway on the wrong thing. Lean follows from that conclusion, but it isn’t the point. The point is matching where you spend capital to where defensibility actually lives.
When You Should Train Your Own Model
Coco isn’t dogmatic about this. She’s clear that some companies should train models — but they should be honest about why.
“It does depend on the layer or the space you are in. If you’re building models, then it does make sense for you to raise more, because model training is very resource intensive.”
If you’re building infrastructure that other AI products will run on — foundation models, specialized vertical models, novel architectures — model training is the product, and the capital makes sense. If you’re building an application that uses models, model training is usually a distraction.
The question to ask is: am I building the model, or am I building on top of the model? If the answer is “on top of,” then training your own is probably an expensive way to get marginal differentiation. If the answer is “the model is the product,” then training is the entire game.
Most AI startups today are building on top, not building the model. Most should act accordingly.
The Speed Argument
There’s a final dimension that doesn’t get discussed enough: the model layer is the fastest-moving layer in AI. If you train your own, you’re committing to keep pace with whoever’s pushing the frontier. That’s a treadmill.
If you build on top of available models, you get the benefit of the entire industry’s improvement. Every time a new state-of-the-art model drops, your application becomes better — without you doing anything. OpenArt’s video generation got better when video models got better. That improvement was free.
The teams committed to training their own models had to spend the same months porting their work to keep up. Free improvements vs. expensive parity. Over a five-year horizon, the gap compounds in favor of the application-layer player.
FAQ
Does this mean no AI startup should train their own model?
No. Frontier model labs (OpenAI, Anthropic, etc.) and specialized infrastructure plays should train models — that’s their core product. Vertical companies with very specific data advantages (e.g., medical imaging with proprietary datasets) sometimes should too. The argument is against application-layer companies training generic models that the open-source ecosystem will catch up on.
How do you decide whether to use a closed-source or open-source model?
It depends on the use case. Closed-source models often have better quality for cutting-edge tasks, but they’re more expensive and locked in. Open-source models are commoditizing fastest and offer flexibility. OpenArt uses what works for each scenario — the architecture is set up to swap models in and out. Lock-in to a single provider is its own form of risk.
What does “AI application layer” mean exactly?
The application layer sits above foundation models and provides product-shaped experiences for end users. A foundation model gives you raw capability (generate an image from a prompt). The application layer turns that capability into something a non-expert user can actually use — workflows, UI, abstractions, multi-step pipelines, and integration with adjacent tools.
If models are commoditizing, isn’t the application layer also commoditizing?
The base capability is. But the application layer compounds in ways the model layer doesn’t. User feedback shapes the workflow. Distribution builds defensibility. Category position becomes structural. None of those things commoditize as fast as model weights do — they require time and user signal that competitors can’t shortcut.
What about defensibility through proprietary data?
Real, but narrower than people think. If your data is uniquely useful for fine-tuning a specific use case, it’s a moat. If your data is just user-generated content that any platform with users would have, it’s less of a moat than it appears. The question is whether the data unlocks capabilities competitors can’t replicate.
How do you avoid being disrupted by the next foundation model release?
Build your product so that better models make you better, not worse. OpenArt’s architecture treats the model as a swappable component. When a new image or video model releases, OpenArt evaluates it and integrates the best one. Their visual storytelling product gets better. If your product depends on a specific model’s quirks, you’re vulnerable. If it depends on capability that any frontier model provides, you ride the wave.
Is this just for consumer AI, or does it apply to enterprise too?
Both, but the dynamics differ. In enterprise, custom models can have value because of compliance, latency, or proprietary data. But even in enterprise, the application layer (workflow integration, security, observability, fine-tuning pipelines) is increasingly where the defensibility lives. The model is necessary but not sufficient.
How fast are AI models actually commoditizing?
Faster in some areas than others. In image generation, the gap between leading and open-source models has narrowed dramatically over 2024-2025. In video generation, the gap is still meaningful but closing. In language models for general tasks, the gap is large for cutting-edge reasoning but small for common workflows. The trend is consistent: capabilities diffuse, then commoditize.
What should application-layer founders do with their capital instead of training models?
Distribution, user research, workflow design, and category-defining product moves. OpenArt’s biggest investments were in pivots (image to video to visual storytelling), in talking to 30 users a week to refine ICP, and in social media presence (their YouTube grew from 50K to 150K in months). All of those compound; none of them are zero-sum against the next model release.
Watch the full conversation
Hear Coco Mao share the full story on Heroes Behind AI.
Watch on YouTubeMore from Coco Mao
Founder Archetype
Read Coco Mao's archetype profile
The Creator · Classical: Daedalus · The Reward