Even at first look, there’s one thing off concerning the physique on the road. The white sheet it’s underneath is a little bit too clear, and the officers’ actions are completely devoid of goal. “We must clear the road,” one in all them says with a agency hand gesture, although her lips don’t transfer. It’s AI, alright. But right here’s the kicker: my immediate didn’t embrace any dialogue.
Veo 3, Google’s new AI video technology mannequin, added that line all by itself. Over the previous 24 hours I’ve created a dozen clips depicting information stories, disasters, and goofy cartoon cats with convincing audio — a few of which the mannequin invented all by itself. It’s greater than a little bit creepy and far more subtle than I had imagined. And whereas I don’t suppose it’s going to propel us to a misinformation doomsday simply but, Veo 3 strikes me as an absolute AI slop machine.
Google launched Veo 3 at I/O this week, highlighting its most essential new functionality: producing sound to go along with your AI video. “We’re coming into a brand new period of creation,” Google’s VP of Gemini, Josh Woodward, defined within the keynote, calling it “extremely real looking.” I wasn’t utterly bought, however then, just a few days later, I had Veo 3 generate a video of a information anchor saying a hearth on the Space Needle. All it took was a fundamental textual content immediate, a couple of minutes, and an costly subscription to Google’s AI Ultra plan. And you already know what? Woodward wasn’t exaggerating. It’s real looking as hell.
I attempted the information anchor immediate after seeing what Alejandra Caraballo, a scientific teacher at Harvard Law School’s Cyberlaw Clinic, was in a position to produce. One of her clips contains a information anchor saying the loss of life of US Secretary of Defense Pete Hegseth. He will not be useless, however the clip is extremely convincing. A publish together with a string of movies with AI-generated characters protesting the prompts used to create them has 50,000 upvotes on Reddit. The scenes embrace disasters, a lady in a hospital mattress utilizing a respiration tube, and a personality being threatened at gunpoint — all with spoken dialogue and real looking background sounds. Real lighthearted stuff!
Maybe I’m being naive, however after enjoying round with Veo 3 I’m not fairly as involved as I used to be at first. For starters, the apparent guardrails are in place. You can’t immediate it to create a video of Biden tripping and falling. You can’t have a information anchor announce the assassination of the president, and even generate a video of a T-shirt-and-chain-wearing tech firm CEO laughing whereas greenback payments rain down round him. That’s a begin.
That stated, you possibly can generate some troubling shit. Without any intelligent workarounds I prompted Veo 3 to create a video of the Space Needle on fireplace. Starting with my very own picture of Mount Rainier, I generated a video of it erupting with smoke and lava. Coupled with a clip of a information anchor saying stated catastrophe, I can see how you could possibly seed some mischief actual simply with this software.
Here’s the higher information: it doesn’t seem to be a ready-made deepfake machine. I gave it a few images of myself and requested it to generate a video with particular dialogue and it wouldn’t comply. I additionally requested it to convey a pair of big boots in a photograph to life and have them stroll out of the scene; it managed one boot stomping throughout the sidewalk with some comical crunching noises within the background.
I had a better time producing movies when my prompts have been much less particular, which is how I confirmed one thing my colleague Andrew Marino identified: Veo 3 is superb at creating the sort of lowest-common-denominator YouTube content material geared toward children.
If you’ve by no means been subjected to the limitless pit of rubbish on YouTube Kids, let me enlighten you. Imagine watching the worst 3D rendering of a monster truck driving down a ramp, touchdown in a vat of coloured paint. Next to it, one other monster truck drives down one other ramp into one other vat of paint — this time, a distinct colour. Now watch that once more. And once more. And once more. There are hours of these things on YouTube designed to mesmerize toddlers. These movies are normally innocent, simply empty energy designed to rack up views that make Cocomelon seem like Citizen Kane. In about 10 minutes with Veo 3, I threw collectively a clip following the identical fundamental components — full with jaunty background music. But the clip that’s much more troubling to me is the 2 cartoon cats on a pier.
I believed it might be humorous to have the cats complain to one another that the fish aren’t biting. In simply a few minutes, I had a clip full with two cats and a few AI-generated dialogue that I by no means wrote. If it’s this simple to make a 10-second clip, stretching it out to a seven-minute YouTube video can be trivial. In its present type, clips revert to Veo 2 once you attempt to lengthen them into longer scenes, which removes the audio. But the way in which that Google has been pushing these instruments ahead relentlessly, I can’t think about it’ll be lengthy earlier than you possibly can edit a full feature-length video with Veo 3.
Honestly, I’m wondering if this kind of use for AI-generated video is a function and never a bug. Google confirmed us some fancy AI-generated video from actual filmmakers, together with Eliza McNitt, who’s working with Darren Aronofsky on a brand new movie with some AI-generated components. And certain, AI video could possibly be an fascinating software in the best arms. But I believe what we’re almost certainly to see is a proliferation of the sort of bland imagery that AI is so good at producing — this time, in stereo.