I built Vaivox from a frustration I felt every single day.

I speak Italian, Portuguese, Spanish, and French. I understand some German. My English is functional but not fluent. And yet, most of the content I wanted to learn from was in a language I couldn't follow naturally.

AI tutorials from Silicon Valley engineers. Crypto market analysis from US traders. Business strategy from founders who only publish in English. Podcasts, lectures, conference talks — the knowledge was everywhere, but understanding it required effort that had nothing to do with the subject itself.

The problem was never the lack of content. The problem was the friction between me and the content.

The subtitle trap

Like everyone, I started with subtitles. Auto-translate on YouTube. Sometimes manual subtitles when available. It worked for short videos. But the moment a tutorial hit 20 minutes, I was exhausted. My eyes bouncing between code on screen and text at the bottom. My brain processing two languages at once — the one I was hearing and the one I was reading.

I tried summarizer tools. They gave me bullet points, but bullet points don't teach you how to build a React component or understand a market trend. The nuance, the reasoning, the examples — all stripped away.

I tried transcripts. Walls of text. Better for searching, useless for learning on the go.

None of these tools solved the actual problem: I wanted to understand the full video, naturally, in my language, without it feeling like work.

Starting with dubbing

The first version of Vaivox was simple. Take a YouTube video, transcribe it, translate the text, generate audio with AI voices. Done — you get the video dubbed in your language.

It was rough. The voices sounded robotic. The timing was off. But the first time I listened to a 30-minute English tutorial in Italian while coding, something clicked. I wasn't reading. I wasn't translating in my head. I was just... learning.

People don't need perfect studio dubbing. They need to understand the value of the video — quickly, clearly, and in their own language.

The moment everything changed

That realization changed the direction of the whole product. I stopped chasing Hollywood-quality dubbing and started asking a different question: what does a person actually need to fully understand a video?

The answer wasn't just audio. It was layers:

  • Translated audio for natural understanding — the primary experience
  • A full transcript for searching, referencing, and studying specific sections
  • An AI summary for previewing — deciding in 30 seconds if a video is worth your time
  • An audio summary for revision — listen to the key points during a walk or commute

Each layer serves a different moment. Together, they cover the full learning journey: discover, understand, review, retain.

Building alone

Vaivox is a solo project. I write the code, design the interface, manage the infrastructure, handle support, and decide what to build next. Some days that's overwhelming. Most days it's the reason the product moves fast and stays focused.

When I notice a friction point — a slow processing step, a confusing UI element, a missing feature — I can fix it the same day. No meetings. No tickets. No roadmap debates. Just: identify the problem, build the solution, ship it.

Every feature in Vaivox earned its place by solving a real problem I experienced myself or heard directly from a user. Nothing is there because it sounded impressive in a pitch deck.

What drives me

Every day I see people discovering creators they didn't know existed. A student in Lisbon following an MIT lecture in Portuguese. A trader in Rome understanding a Korean analyst's breakdown. A designer in Madrid learning from a Japanese typography expert.

These aren't hypothetical scenarios — they're real use cases from real users. And every time someone says "I finally understood that video", it confirms why I keep building.

The technology will keep improving — faster processing, more natural voices, better speaker detection, smarter summaries. But the mission stays the same:

Remove the invisible barrier between people and the knowledge they want. One video at a time.

That's why I built Vaivox. And that's why I'm still building it.