Creative Workflow Roundup: OpenAI's o1 and ChatGPT Pro Launch, Runway Act-One Video Update, New AI Models from Google, Amazon, and Luma

Dec 07, 2024

∙ Paid

In This Week's Roundup: OpenAI pushes boundaries with the official o1 release and their premium ChatGPT Pro tier, offering advanced tools like visual reasoning and deeper problem-solving. Runway upgrades Act-One with video input support, making performance reanimation more practical for live-action and animation workflows. Tencent shakes up the industry by open-sourcing Hunyuan Video, a model rivaling top-tier competitors. Meanwhile, Amazon unveils its versatile Nova AI models, Google’s Veo enters private preview, Luma teases its cutting-edge Ray 2 model, and more!

OpenAI’s o1 Model Officially Launches: Visual Reasoning and Faster Responses

The News: OpenAI has officially launched the o1 model, a major upgrade that combines faster, more concise reasoning with image analysis capabilities. While GPT-4o already supported visual input, o1 takes a different approach: it’s optimized for math, logic, and planning, making it far better at staying coherent across complex, multi-step creative and technical projects. Compared to the earlier o1-preview, this version is faster and more versatile—ideal for tasks like refining storyboards, brainstorming intricate narratives, or solving logic-heavy problems.

The o1 model is immediately available to Plus and Team users through the model selector, while Enterprise and Education users will gain access in a week.

In addition, OpenAI launched ChatGPT Pro, a $200/month premium service designed for users tackling high-complexity challenges. This plan includes unlimited access to the o1 model, o1-mini, GPT-4o, and enhanced voice features. ChatGPT Pro also introduces an “o1 pro mode,” which uses increased computational resources to deliver deeper reasoning and more effective solutions for challenging tasks in fields like science, finance, and advanced research.

Lab Notes: While these advancements unlock new possibilities, they are clearly geared toward high-demand users with specialized needs. For producers, even with the limited (but generous) access to o1 on OpenAI’s $20/month Plus tier, the model has the potential to transform tasks like brainstorming creative concepts or summarizing handwritten notes and sketches through its image analysis capabilities.

Ethan Mollick, a Wharton professor studying AI, provides some helpful context - o1 isn't necessarily better at everything, but it excels at cracking specific high-level problems where other models stumble. For most tasks, Claude 3.5 Sonnet, GPT-4o, (and even Gemini) still hold their own. The new ChatGPT Pro at $200/month is clearly targeting power users in fields like R&D and finance where enhanced computation directly impacts the bottom line. Most producers won't need this level of access - the standard plans pack plenty of punch.

I'm particularly excited about o1's potential for tackling the kinds of creative challenges that require both visual understanding and deep strategic thinking. More testing to come as I explore these capabilities further.

Runway’s Act-One Expands from Image to Video Inputs for Performance Reanimation

The News: Runway’s Act-One, which previously only allowed users to reanimate characters using static image inputs, has introduced support for video inputs. This update significantly enhances its utility by enabling dynamic performance reanimation directly onto characters in live-action and animated footage. With video input capabilities, Act-One can now refine entire scenes, making it possible to audition new lines, adjust reaction shots, and add expressive performances without needing reshoots or complicated setups. Users can simply record a new performance on their phone and apply it to existing footage, unlocking greater creative control and efficiency. Additionally, Act-One now supports vocal performances.

Lab Notes: When I first started using Act-One, its reliance on image inputs made it feel somewhat limited. With this new capability to transpose full video performances, Act-One feels much more practical for real-world use.

The ability to reanimate footage with just a phone-recorded performance is a logistical dream. This eliminates the need for costly and time-consuming reshoots while still allowing for creative iteration. For example, you can test alternative takes, add nuanced emotional beats, or refine scenes without returning to set. It’s also exciting to see Act-One extend its reach to vocal performances. This could be particularly valuable in music video workflows, where syncing vocal performance to animated or live-action character movement often requires detailed manual adjustments.

As with any tool, though, the value lies in how thoughtfully it’s used. While it’s easy to see these features as shortcuts, the true potential comes from using them to enhance storytelling and creative control, not just save time.

My New Favorite AI Tool—Fal Brings Pay-As-You-Go Flexibility to Media Generation

The Tool: Fal.ai offers pay-as-you-go access to some of the best image and video models, like FLUX1.1 [Pro], Kling 1.5, Hunyuan Video, and others. While Fal is mainly designed for developers, don’t be deterred by the requirement to sign in with a GitHub account—it’s worth the initial setup. Once you’re in, the user-friendly visual interface makes it surprisingly easy to navigate. No need for monthly commitments or heavy infrastructure—just pay for what you use and tap into top-ranked models.

Lab Notes: This platform has been around for a while, but I recently started using Fal regularly, and it’s quickly become one of my favorites (not sponsored or an affiliate!). Fal is a fantastic tool for experimenting with high-performance models without overcommitting. After setup, everything feels smooth and approachable—even for non-developers. It’s great to have powerful tools like these so accessible. Definitely worth adding to your workflow!

Runway’s Video Keyframing Prototype Unlocks Nonlinear Creative Exploration

The News: Runway has introduced a new video keyframing prototype that uses a graph-based structure to explore creative possibilities. Images are represented as nodes connected by edges—transitions that move between frames in latent space. This approach allows users to reimagine creative workflows by combining precise control with unpredictable, serendipitous discovery.

Creative Workflow Lab

Creative Workflow Roundup: OpenAI's o1 and ChatGPT Pro Launch, Runway Act-One Video Update, New AI Models from Google, Amazon, and Luma

This post is for paid subscribers