Why Open Source Infrastructure Wins for Production Voice Age

There’s an easy way to build a voice agent. Use a GUI tool. Click together a conversation flow. Connect some APIs. No code required.

It looks faster. It feels simpler. Dozens of companies (Vapi, Retell, others) have built businesses around this convenience. For your first prototype, it works.

But Tom Shapland, PM at LiveKit, has watched thousands of engineers evaluate both paths. The pattern is consistent: proprietary tools win on initial ease, but open-source wins on production requirements.

“It just makes sense to start with a framework, ideally an open source framework where you can actually see what the code is doing,” Tom says. “It’s not a black box.”

The difference isn’t about ideology. It’s about what happens when you need to control something.

Where GUI Tools Break

A GUI tool lets you build fast because it abstracts complexity. You don’t think about WebRTC or turn detection or parallel LLM generation. You point and click.

Until you need something the tool doesn’t support.

Maybe your turn-detection parameters don’t work for your use case. Maybe you need a custom model or a specific orchestration pattern. Maybe the tool’s underlying infrastructure doesn’t scale to your volume. Maybe it’s too expensive at scale.

At that point, you’re stuck. You’ve built a prototype in a proprietary system. To ship production, you need to either:

Accept the tool’s constraints and limitations
Rebuild everything in code

Most teams choose option 2. Not because they want to, but because their business requirements exceed what a GUI can express.

Tom’s framing is direct: “We provide an open source framework where you can actually see what the code is doing. It’s not a black box.”

What “Control” Means in Production

In production, control means:

Debugging what went wrong. When a user’s call fails, you need to know why. Was it the transcription? The LLM response? The TTS? A GUI tool can’t show you. Open-source code can.

Customizing for your use case. Different voice agents have different requirements. A 911 operator system needs different turn-taking behavior than a sales rep. You need to adjust parameters and layer in custom logic. A GUI has preset options. Code has infinite flexibility.

Scaling beyond the platform. A GUI tool’s infrastructure has limits. When you hit them, you can’t optimize. Open-source lets you host it yourself or use the framework on your own infrastructure.

Integrating with your systems. Your voice agent needs to talk to your CRM, your database, your internal APIs. A GUI tool supports some integrations. Your code supports all of them.

Using the latest models. New speech-to-text models come out constantly. New LLMs are deployed weekly. A GUI tool waits for its vendor to integrate them. Open-source frameworks can add them immediately.

The Migration Pattern

Tom sees a consistent pattern: teams start with a GUI tool, then migrate to open-source.

“The market works like this: There’s proprietary tooling that’s easier to get started with. But as you scale, you realize there’s all these edge cases and things you need to control that the proprietary tool doesn’t let you do. And then you eventually migrate to something that’s more flexible and more open.”

This isn’t a failure of the GUI approach. It’s just the natural evolution of any software system. Easy to start with. Need to own more as you grow.

The teams that understand this upfront don’t start with a GUI. They start with open-source. They trade initial ease for long-term control.

Why LiveKit’s Approach Works

LiveKit is open-source. The code is on GitHub. You can see exactly how turn detection works, how models are orchestrated, how audio is transported. You can modify it. You can deploy it yourself.

This transparency matters. “We have an open source agents framework where you use our Python code, our TypeScript code to build agents,” Tom explains. “We take care of all the abstractions around building a voice agent, like handling turn detection and making sure that the audio is reaching the agent.”

The framework gives you:

Pre-built components that work. Turn detection, transcription, LLM orchestration, text-to-speech. These are hard to get right. LiveKit has solved them.

Flexibility to customize. You can adjust how each component works. You can swap in different models. You can add your own logic.

Visibility into what’s happening. No black box. You read the code. You understand the choices.

Control over deployment. Host it where you want. Scale it the way you need. Use your own infrastructure.

The Licensing Question

Open-source often means free, but not always. Open-source means the code is visible and modifiable. LiveKit is open-source (permissively licensed) but the company also runs a cloud platform that charges for scale.

This is the right trade-off: developers get visibility and control, LiveKit gets paid for managed infrastructure.

It’s not a trap. It’s just honest: if you want someone else to host and manage it, you pay for that service. If you host it yourself, it’s free.

FAQ

Why would I choose open-source when a GUI tool is faster to start?

GUI tools are faster for prototypes but fail at scale when you need customization, debugging visibility, or deployment control. Open-source is slightly slower to start but gives you infinite flexibility later.

What happens when my voice agent outgrows the GUI tool?

You either accept its limitations (constrained turn-taking, limited integrations, fixed pricing) or rebuild on open-source infrastructure. Most teams choose to rebuild.

Can’t a GUI vendor just add features I need?

They can, but they’re optimizing for the median user, not your specific needs. Your edge case might not be their priority. Open-source lets you solve it yourself.

Does “open-source” mean I have to host it myself?

No. You can use a managed cloud platform (like LiveKit Cloud) and still have access to the code and the ability to deploy elsewhere if needed. You get both: ease of use plus control.

How is a GUI tool’s pricing model different from open-source?

GUI tools often charge per call or per minute of agent time. Open-source frameworks let you host it and pay only for the underlying LLM and TTS APIs. At scale, this can be much cheaper.

What if I need a GUI tool for non-technical team members?

You can build a GUI on top of the open-source framework. This is what many teams do: open-source framework + your own interface for your team’s needs.

Is open-source harder to deploy than a GUI tool?

Slightly, at the start. But less hard than rebuilding from scratch when you outgrow the GUI. The learning curve is worth it if you’re shipping production.

What’s the difference between “open-source” and “free”?

Open-source means the code is visible and modifiable under a license. Free means no cost. Open-source code can be paid (commercial licenses, managed platforms). Free code can be proprietary (source unavailable).

Can I use an open-source framework if I’m not a software engineer?

Not directly. But you can hire engineers to customize it, or use a managed platform that bundles the open-source framework with operations. You don’t have to be a DevOps expert to benefit from open-source flexibility.

What should I ask an open-source framework vendor to know if I’ll outgrow them?

Ask: Can I deploy this myself? Can I modify the code? Can I swap in different models? Can I use this on my own infrastructure? If the answer to all four is “yes,” you have flexibility for the future.

Why Open Source Infrastructure Wins for Production Voice Agents

Where GUI Tools Break

What “Control” Means in Production

The Migration Pattern

Why LiveKit’s Approach Works

The Licensing Question

FAQ

More from Tom Shapland

Related Insights

Pipeline vs Speech-to-Speech: Why Production Voice Agents Still Choose the Hard Way

Are Websites Going Away? One AI Engineer's Case for llms.txt Replacing Landing Pages

B2B vs Consumer Voice Agents: Why They're Not Built the Same Way