Two undergrads constructed an AI speech mannequin to rival NotebookLM

A pair of undergrads, neither with intensive AI experience, say that they’ve created an overtly accessible AI mannequin that may generate podcast-style clips much like Google’s NotebookLM.

The marketplace for artificial speech instruments is huge and rising. ElevenLabs is without doubt one of the largest gamers, however there’s no scarcity of challengers (see PlayAI, Sesame, and so forth). Investors consider that these instruments have immense potential. According to PitchBook, startups creating voice AI tech raised over $398 million in VC funding final yr.

Toby Kim, one of many Korea-based co-founders of Nari Labs, the group behind the newly launched mannequin, stated that he and his fellow co-founder began studying about speech AI three months in the past. Inspired by NotebookLM, they wished to create a mannequin that provided extra management over generated voices and “freedom within the script.”

Kim says they used Google’s TPU Research Cloud program, which offers researchers with free entry to the corporate’s TPU AI chips, to coach Nari’s mannequin, Dia. Weighing in at 1.6 billion parameters, Dia can generate dialogue from a script, letting customers customise audio system’ tones and insert disfluencies, coughs, laughs, and different nonverbal cues.

Parameters are the interior variables fashions use to make predictions. Generally, fashions with extra parameters carry out higher.

Available from the AI dev platform Hugging Face and GitHub, Dia can run on most fashionable PCs with not less than 10GB of VRAM. It generates a random voice except prompted with an outline of an meant model, however it may well additionally clone an individual’s voice.

In TechCrunch’s transient testing of Dia by way of Nari’s internet demo, Dia labored fairly nicely, uncomplaining producing two-way chats about any topic. The high quality of the voices appears aggressive with different instruments on the market, and the voice cloning perform is among the many best this reporter has tried.

Here’s a pattern:

Like many voice turbines, Dia gives little in the best way of safeguards, nevertheless. It’d be trivially straightforward to craft disinformation or a scammy recording. On Dia’s undertaking pages, Nari discourages abuse of the mannequin to impersonate, deceive, or in any other case have interaction in illicit campaigns, however the group says it “isn’t accountable” for misuse.

Nari additionally hasn’t disclosed which information it scraped to coach Dia. It’s attainable Dia was developed utilizing copyrighted content material — a commenter on Hacker News notes that one pattern sounds just like the hosts of NPR’s “Planet Money” podcast. Training fashions on copyrighted content material is a widespread however legally doubtful apply. Some AI firms declare that honest use shields them from legal responsibility, whereas rights holders assert that honest use doesn’t apply to coaching.

In any occasion, Kim says Nari’s plan is to create an artificial voice platform with a “social facet” on high of Dia and bigger, future fashions. Nari additionally intends to launch a technical report for Dia, and to broaden the mannequin’s assist to languages past English.

Source hyperlink

Two undergrads constructed an AI speech mannequin to rival NotebookLM

Recent Articles

The Oscars formally don’t care if movies use AI

A Chinese AI video startup seems to be blocking politically delicate photos

The US hikes tariffs on photo voltaic merchandise from Asia

Tick Season Has Arrived. Try This Trick to Safely Remove a Tick Without Tweezers

Tesla earnings Q1 2025: What we’re expecting

Related Stories

Leave A Reply Cancel reply

Stay on op - Ge the daily news in your inbox