Creative Workflow Roundup: OpenAI Operator, Spline Spell, Midjourney’s 3D Push, and AI Tools for Video, Lighting, and Animation
In This Week’s Roundup: OpenAI’s Operator introduces browser-based automation with potential to (eventually) reshape creative workflows, while Spline’s Spell generates immersive 3D worlds from a single image. Midjourney reveals plans for video models, 3D capabilities, and its highly anticipated V7 model. Fal.ai launches an open-source video editing interface, combining top-tier AI tools. Meanwhile, Adobe introduces AI-powered visual search in Premiere Pro and previews its SynthLight portrait lighting model. Netflix’s research showcases keyframe animation breakthroughs, and tools like Krea Real-Time, Freepik AI Suite, and Kling Elements push the boundaries of custom model training, image-to-video workflows, and advanced animation.
OpenAI Operator: The Dawn of Autonomous Creative Collaboration
The News: OpenAI has launched Operator, a groundbreaking browser-based agent now available for Pro subscribers ($200/month). Plus users are expected to gain access in the future, though no specific timeline has been announced. Operator is still a “research preview” that is powered by a specialized model called the Computer Use Agent (CUA), designed to perform web tasks autonomously. It works by replicating user actions—typing, clicking, scrolling—and taking iterative screenshots to assess its progress before deciding its next move.
Operator is being touted as a collaborative tool. Users can take control of the session at any time, with Operator resuming only after the user hands control back. Operator doesn’t monitor what users do during manual input, such as logging into accounts. Safety is a key focus, with safeguards against misuse by users, the model itself, or potential vulnerabilities on external websites.
Though Operator’s current use cases are in early experimentation—such as automating meme creation—the tool signals a significant leap forward for browser-based automation, laying the groundwork for more advanced applications.
Lab Notes: Projects like Open Interpreter have been exploring similar territory with open-source tools, and Anthropic’s Claude has computer-use capabilities as well. But with Operator now live, OpenAI has officially set the bar for practical, user-facing automation.
For now, Operator’s abilities are limited. It works only in a virtual browser and relies on static screenshots to make decisions. This constraint narrows its use cases, but it’s easy to see where this could evolve. A future version of Operator could transition to continuous, real-time screen monitoring instead of discrete screenshots, which would unlock a new level of responsiveness. Operator could potentially move beyond the browser and gain access to more software, enabling it to operate creative programs like Photoshop, Premiere Pro, or After Effects.
Think about what that would mean: an agent making live decisions about video cuts, transitions, or applying effects—all without needing constant supervision. It’s not hard to imagine how this might reshape entire creative workflows as it gains more access to local machines and software environments.
This also highlights a larger trend. We’re moving toward a creative landscape where technical skills—like mastering the technical details of editing software—matter less, and managing AI tools matter more. Of course, this raises tough questions about job displacement on the technical side of creative fields.
It’s clear that we’re at the start of a transformative moment. Operator is just one step in a broader trend toward autonomous, intelligent tools that can handle increasingly complex creative work. As creatives, our role is evolving. We need to focus on how to direct, supervise, and innovate alongside these tools. For now, Operator is limited, but its trajectory signals much bigger changes ahead.
Spline’s Spell: Generating 3D Worlds
The News: Spline has unveiled Spell, an early-stage model that can generate entire 3D worlds from a single image in just minutes. The generated worlds are consistent with the original image input and are represented as volumetric data that can be rendered using techniques like Gaussian Splatting or NeRFs. This functionality allows creators to turn 2D concepts into fully immersive 3D environments quickly and with minimal effort.
Spell is being released initially with limited access and an intentionally high price tag of $99 per month, targeting early adopters. The team at Spline is using this phase to gather insights on user interaction with the tool while managing the high GPU costs associated with running such complex models.
Lab Notes: Spline has already carved out a name for itself as a highly capable browser-based 3D tool, and the launch of Spell takes its capabilities to another level. Tools for generating 3D worlds are not entirely new—research around NeRFs, Splats, and other volumetric rendering techniques has been heating up—but seeing it implemented into a creative tool as powerful as Spline is exciting.
It will be interesting to see how other companies, like Midjourney, respond as they develop their own 3D tools. The competition will likely accelerate innovation and push accessibility for artists and designers who want to create in 3D but lack the technical expertise or time for manual modeling.
We might be looking at a dramatic shift for 3D asset creation and 3D production broadly. Imagine integrating something like Spell directly into game engines or virtual production workflows. The ability to turn flat imagery into navigable, immersive spaces could impact workflows across industries.
Midjourney’s Push Toward 3D and Video
The News: Midjourney is pushing into new creative territory, with updates from this week’s office hours revealing major plans for video, 3D tools, and its next-generation V7 image model. The company is experimenting with two video models, offering users a choice between fast/affordable and slow/high-quality outputs. These initial releases will integrate Midjourney’s signature aesthetic enhancements and launch in an experimental phase to gauge community interest.
The V7 model, currently in final tuning, is expected to debut in mid-February. It promises better image quality, improved prompt comprehension, multilingual support, and an advanced referencing system for characters, logos, and objects.
Midjourney is also exploring 3D capabilities, including camera movement, 3D reframing, and rerendering for refined outputs. In addition, the team hinted at secret projects—potentially involving hardware—with more announcements expected within six months.
Lab Notes: Midjourney has consistently led the way in image-gen AI, despite its bootstrapped fundraising approach and the rise of competing models. While other tools may score higher on some technical benchmarks, Midjourney’s unique combination of aesthetics, usability, and its new web-based interface has kept it competitive. For newcomers, its interface is an immediate advantage. I recently made a video tutorial for anyone getting started.
The push into video and 3D capabilities is especially exciting. And the V7 update is poised to strengthen its position. Midjourney’s “secret” projects, including potential hardware, hint at an even bigger ambition to diversify their offerings. We’ll see!
Fal Launches Open Source AI Video Starter Kit
The News: Fal.ai has announced the release of the AI Video Starter Kit, an open-source, browser-based editor interface for AI video creation. In partnership with companies like Luma Labs, GenmoAI, Play.ht, and Vercel, the Starter Kit aims to simplify the process of building AI-powered video editing experiences. The tool is now live and available for anyone to experiment with, and Fal encourages developers to fork the repository to start building their own custom AI video interfaces.
The Starter Kit reflects Fal’s broader mission: to provide access to industry-leading models on a pay-as-you-go basis. These include FLUX, Kling, and Hunyuan Video, among others. By connecting cutting-edge AI tools into one interface, Fal is a unified platform that could bridge the gaps in AI-driven video production.
Lab Notes: This kit has the potential to act as a central "hub," combining the strengths of multiple models and platforms into a cohesive workflow. With its browser-based, open-source, and extensible interface, Fal is positioning itself as a foundational tool for the next generation of AI video creation.
This project highlights the future direction of AI video tools—not just offering advanced features, but seamlessly integrating them into creative workflows. As competition heats up in this space, tools like Krea and Kaiber Superstudio are also racing to unify the growing array of AI technologies into streamlined, user-friendly solutions that align with how creators work.
Additional Findings: Quick Updates on Important News and Workflows
Adobe Premiere Pro Media Intelligence: Premiere Pro now offers AI-powered visual search, allowing users to find video clips by describing their contents. It also adds caption translation support for 17 languages and other workflow enhancements.
Google’s Imagen 3 Tops Text-to-Image Arena: Google DeepMind’s Imagen 3 reaches #1 in the Text-to-Image Arena, surpassing Recraft-v3 by 70 points and setting a new benchmark.
Kling Elements: Kling AI 1.6 introduces Elements, a feature that lets you upload 1-4 images, select subjects, and describe their actions and interactions to generate videos based on your prompts.