Creative Workflow Roundup: Veo 2 Hits YouTube, AI Video Advances, OpenAI’s Next Big Move w/ GPT-5, and More!
In This Week’s Roundup: Veo 2 is now inside YouTube’s app, making AI VFX more accessible than ever. Adobe Firefly’s video model is expanding in public beta, while new AI tools like Ray2 and Hailuo’s Director model are giving producers more control over motion, composition, and storytelling. Meanwhile, OpenAI has revealed its roadmap for GPT-4.5 and GPT-5, signaling a shift toward a more unified AI ecosystem. Plus, Midjourney is hinting at hardware plans, Alibaba’s latest research takes full-body swapping to new levels, and a fully AI-generated feature film has arrived. All that and more!
Google Veo 2 Brings AI VFX Directly to YouTube App
The News: Google has integrated Veo 2 into the YouTube app, allowing users to generate AI-powered video clips and post them in Shorts. This version is noticeably faster than the standalone model, likely optimized for mobile use. It follows a text-to-image then image-to-video workflow with preset styles like vintage, anime, claymation, and digital. Users can re-roll images before proceeding to the video stage and even layer Veo 2 clips as green screen backgrounds. The feature is currently available only in the US, Canada, Australia, and New Zealand.
Lab Notes: Until now, Veo 2 access has been limited to a small group, so seeing it roll out to everyone inside the YouTube app is a big moment. From all the tests I’ve seen, Veo 2 is currently the best-performing AI video model, outpacing the current versions of OpenAI’s Sora, Runway’s Gen-3, and others in terms of motion quality, coherence, and realism. Turning the YouTube app into a lightweight VFX studio means more people can experiment with AI-enhanced visuals without extra software. The ability to add multiple AI-clips in the timeline is a very basic but still much appreciated feature, considering it’s a mobile editing interface. But for more advanced users, the big question is when Google will open Veo 2 access beyond the mobile app, especially via a web interface where it could be integrated into broader creative workflows.
Adobe’s Firefly Video Model Enters Public Beta with Licensed Data Approach
The News: Adobe has officially released its text- and image-to-video AI generator in public beta. Available through the redesigned Firefly web app, the tool offers two main features: Text-to-Video and Image-to-Video. Users can generate short clips (up to five seconds) with text prompts or start from a reference image to guide the look of the final output. The model now produces 1080p video at 24 fps, an improvement over its earlier 720p version. Adobe also introduced AI credit subscription tiers and deeper Creative Cloud integration. Unlike other AI video models, Firefly is trained entirely on licensed and public domain content, making it a safer choice for commercial use.
Lab Notes: I tested this in when it was in private beta, and while the output isn’t as strong as models like Veo 2 or Sora, Adobe’s fully licensed data approach is a key differentiator. For producers worried about copyright issues, this is a rare AI video model that comes with clear commercial-use protections. The ability to refine motion, styles, and camera angles is a nice touch, but overall, it still lags behind competitors in realism and quality. Curious to see how much the upcoming 4K version improves things.
Hailuo (MiniMax) Text-to-Video 01-Director Model Expands AI Video Control
The News: Hailuo AI has introduced the T2V-01-Director, an AI video generation model designed to offer more precise control over camera movements, character actions, and scenery details. This advancement allows users to create videos with structured cinematic direction, responding well to detailed production notes. The model is now available on Hailuo’s site and Krea. It’s also available on Replicate and Fal, two platforms that host a wide range of AI models with pay-as-you-go access.
Lab Notes: AI video generation keeps moving toward more directorial control. The ability to guide framing, movement, and scene composition with production-style instructions is key for storytelling. As this tech improves, it shifts the trend of AI videos as cherry-picked demo clips you might have seen in “AI influencer threads” to fully directed sequences that actually tell a cohesive story. I would love to see an image-to-video version of this model. Speaking of image-to-video…
Ray2 Introduces Image-to-Video for More Controlled AI Animation
The News: Luma AI’s Ray2 model now supports image-to-video generation, allowing users to animate any still image using Dream Machine. Ray2 enables the creation of up to 10-second, 720p videos with realistic motion and a fairly good grasp of real-world physics. By starting with an image—whether a painting, real-life photo, or a custom AI image—producers gain more control over the final output compared to traditional text-to-video methods.
Lab Notes: Starting with an image instead of relying solely on text prompts makes AI video generation much more predictable. Ray2 was already a solid model, but this update makes it a more practical tool for structured storytelling. Being able to guide the look of a scene upfront instead of waiting to see what the AI interprets with text-to-video is a major workflow improvement.
OpenAI Confirms GPT-4.5 and GPT-5: Roadmap Focuses on Unified AI