Creative Workflow Roundup: FLUX1.1 High-Res Image Gen, Runway CEO’s Vision for AI Workflows, Apple Vision Pro For Media Prod, and Cartesia's AI Voice Changer

Nov 09, 2024

∙ Paid

In This Week’s Roundup: FLUX1.1 [pro] introduces high-res Ultra and Raw modes, Runway’s CEO shares insights on moving beyond prompt-based AI tools, and StudioPeople showcases XR tools on Apple Vision Pro. Also featured: Cartesia AI’s new Voice Changer and updates from Suno, Recraft, and Kling AI.

FLUX1.1 [pro] Ultra and Raw Modes for High-Resolution Image Generation

The News: Black Forest Labs has announced new features for its FLUX1.1 [pro] text-to-image model, enabling higher-resolution outputs and enhanced aesthetic flexibility. The ultra mode supports image resolutions up to 4 megapixels while maintaining a rapid generation time of 10 seconds per sample, priced at $0.06 per image. Additionally, the raw mode introduces a natural, candid aesthetic for users seeking more realistic outputs, particularly in human and nature photography. Both modes are accessible via the FLUX1.1 [pro] API, offering faster performance and expanded creative options compared to similar models.

Lab Notes: The ability to generate 4MP images without a drop in speed is impressive—most high-resolution models slow down significantly, but this one maintains a competitive edge. The price point is reasonable, especially for workflows that need fast, high-quality results.

The raw mode intrigues me for its focus on natural aesthetics and diversity in human subjects. This could be useful for projects aiming for authenticity, like editorial content or environmental visuals. While FLUX1.1 [pro] is catching up, Recraft’s V3 model still holds the top spot for benchmark performance. However, FLUX’s speed and versatility might make it a strong contender.

The Evolving Future of AI Interaction and Creative Workflows

The News: Cristóbal Valenzuela, co-founder and CEO of Runway, predicts that prompt-based interaction with AI may soon become less relevant. Instead, he envisions more intuitive tools like gesture recognition and advanced visual feedback systems, enabling richer, non-verbal exploration of AI's latent capabilities. This shift challenges the current linear paradigms of creative work, advocating for tools that encourage discovery over direct output. Cristóbal's critiques the "assembly-line" mindset of AI development, emphasizing the need for "playgrounds" rather than "pipelines" to embrace the complex and iterative nature of creativity. Runway’s Advanced Camera Controls and Act-One hint at this direction, showing promise in creating intuitive creative environments where exploration drives innovation.

Lab Notes: Cristóbal is a voice worth paying attention to in the evolving AI landscape, especially for creative professionals. Under his leadership, Runway played a pivotal role in advancing diffusion models, particularly through their collaboration with researchers from the University of Munich to create the initial version of Stable Diffusion in 2021.

Cristóbal’s insights about moving beyond prompts and toward more intuitive AI interactions resonate with my creative process. Current workflows often feel constrained by rigid input-output dynamics, but his vision challenges this. By advocating for tools that leverage gestures, visual cues, and exploratory methods, he imagines AI systems that align with the inherently iterative and messy nature of real-world creativity. His emphasis on creating "playgrounds" underscores the importance of discovery, where each interaction opens up new possibilities rather than simply executing predefined goals.

Runway’s tools, such as Act-One and Camera Controls, already hint at this future, offering creative professionals new ways to engage with AI. These developments suggest that the next evolution of AI isn’t just about efficiency—it’s about enabling richer, deeper collaboration between human intuition and machine capability.

Creative Workflow Lab

Creative Workflow Roundup: FLUX1.1 High-Res Image Gen, Runway CEO’s Vision for AI Workflows, Apple Vision Pro For Media Prod, and Cartesia's AI Voice Changer

This post is for paid subscribers