Creative Workflow Roundup: Video Captioning Strategies, Stable Audio 2.0, Enhanced DALL·E Editing, Runway's AI Film Festival

This Week's Top Creative Workflow News, Tools, & Trends

Apr 07, 2024

This week in the Creative Workflow Lab, we're zooming into the evolving landscape of generative music as Stable Audio 2.0 sets its sights on Suno AI, unpacking John Stewart's humorous take on AI, highlighting DALL·E 3's innovative "inpainting" feature, and exploring the potential of creative workflows with Open Interpreter, among other insights.

To celebrate the re-launch this week’s post is free for all subscribers. Upgrade to support the newsletter and get every week’s top creative workflow news, AI tools, and media production trends delivered to your inbox with a paid subscription.

Stable Audio 2.0 Challenges Suno AI in the Generative Music Arena

The News: Entering the competitive field of AI-generated music, Stable Audio 2.0 emerges as a formidable contender to Suno AI. Stable Audio 2.0 boasts the ability to produce high-quality, full music tracks with coherent structures up to three minutes long at a high fidelity of 44.1 kHz stereo, all from a single prompt. Notably, it expands its offerings beyond text-to-audio generation by incorporating audio-to-audio capabilities, allowing users to upload audio samples and transform them into a variety of sounds through simple natural language prompts. Stable Audio 2.0 was trained exclusively on a dataset licensed from the Audio Sparx music library, ensuring respect for opt-out requests and guaranteeing fair compensation for creators.

Why It's Relevant for Creative Professionals: The introduction of Stable Audio 2.0 into the generative music landscape is noteworthy for several reasons. Firstly, its commitment to fair compensation for creators sets a precedent in the realm of AI-generated content, addressing a crucial concern regarding the ethical use of sourced materials. This development in music production technology offers creatives a new, versatile tool for experimenting with sound, providing the capability to not only generate new music from textual descriptions but also to repurpose and transform existing audio samples into something entirely new. While Suno AI continues to hold favor among many for its generative music capabilities, Stable Audio 2.0's innovative features and ethical stance present an appealing alternative that broadens the horizon for music producers, sound designers, and all involved in the creative process of music making.

John Stewart, Apple, and AI

The News: John Stewart, in a recent segment on The Daily Show that went viral this week, didn't just crack jokes about AI's role in modern society; he dove deep into the core of the current AI discourse, scrutinizing the hype surrounding AI technologies. Stewart argued (in an admittedly hilarious way) that the public is being oversold on AI's capabilities, expressing skepticism towards grandiose claims that AI will resolve humanity's most daunting challenges. His humorous critique also delved into concerns about AI's potential to replace workers for profit maximization.

Apple recently opted to cancel Stewart's program, The Problem with John Stewart, on Apple TV Plus. Reports suggest this decision was influenced by Stewart’s intentions to tackle subjects like AI, an area of sensitivity given Apple's own ambitions in AI development and potential AI partnership with Google.

Why It's Relevant for Creative Professionals: As Apple, OpenAI, and other companies move forward with AI innovations (already impacting creative fields profoundly) the dialogue around these technologies’ ethical use and their implications on creative jobs becomes crucial. John Stewart's segment, while humorous, did touch on valid concerns regarding AI, such as the replacement of human roles with AI for efficiency and profit.

However, Stewart’s portrayal of prompt engineers as mere "janitors" simplifies the complexity and creativity involved in AI fields, where roles akin to creative directors utilize AI as a tool for innovation and creation. It’s about recognizing AI’s potential to transform creative workflows while also advocating for ethical standards and transparency in how these technologies are developed and deployed.

This discussion comes at a crucial time as major tech companies, like Apple, navigate their relationship with AI. Rumors of Apple developing its own AI technology—to be unveiled at their WWDC event on June 10th—could come with significant implications for creative professionals. Whether it's AI video processing (iPhone’s camera already uses plenty of ML for image/video processing but maybe they could do more with the latest AI solutions utilized by tools like Topaz Labs) or Apple designed tools for image generation and AI editing, or a partnership with Google, the possibilities could be interesting. Apple's approach to AI, particularly in how they balance innovation with the creative essence of their tools, will likely set a precedent for the industry.

Enhanced Image Editing in DALL·E 3 with a ChatGPT Plus Subscription

The News: A significant update has been introduced to DALL·E 3 for users with a ChatGPT Plus subscription, featuring the helpful "inpainting" capability. This functionality aids in the editing process of AI-generated images by enabling precise modifications within specific areas of an image, in this case all facilitated directly through the ChatGPT interface.

Why It's Relevant for Creative Professionals: While the concept of inpainting itself isn't new—services like Adobe Photoshop & Firefly have offered a similar “Generative Fill” functionality for awhile now—the integration of such capabilities into DALL·E 3 via a chat interface is a solid update. This innovative approach to editing, where adjustments are made by simply chatting with the AI, for some workflows can significantly streamline the creative process.

That being said, Midjourney still generates the best image output quality for the majority of use cases. Inpainting was available in Midjourney and then it was removed with a recent update. It’s expected to be coming back soon. You can always use Photoshop Generative Fill for quick fixes.

Adam Faze’s New Strategy for Video Captioning

The Trend: Adam Faze, the creative powerhouse behind the success of web-focused production studio Gymnasium (formerly FazeWorld), this week offered a fresh perspective on video text captioning. Faze recommends that video editors and producers stay away from the bold, central captions that have recently characterized social media content across various platforms. These captions, traditionally used as a tool to grab viewer attention and enhance watch time, are being reconsidered. Faze notes that he is pivoting towards more subtle, cinematic subtitles—positioned at the lower third of the screen—as automatic subtitle features become more prevalent on digital platforms.

Why It's Relevant for Creative Professionals: The move towards smaller, unobtrusive subtitles not only reflects changing viewer preferences but also challenges creatives to rethink engagement strategies. As automatic captioning becomes standard, the emphasis on crafting content that is both visually appealing and accessible has never been more critical. This trend invites professionals to explore new ways of captioning that prioritize viewer experience without sacrificing the content's reach and impact.

Higgsfield AI Enters the Competitive AI Video Creation Market

The News: Alex Mashrabov, leveraging his experience as the former head of generative AI at Snap, has launched Higgsfield AI. This new venture aims to carve out a niche in AI-powered video creation with its app, Diffuse, which allows users to generate videos from text or selfies.

Why It's Relevant for Creative Professionals: Higgsfield AI's entry highlights the ongoing innovation in AI video creation tools. However, its differentiation will be crucial, especially as established models continue to set high standards in the industry. The AI video creation sector is highly competitive, with several platforms vying for dominance. The space is currently led by platforms like Runway's Gen-2 model and Pika’s model, known for their advanced text-to-video and image-to-video capabilities. With the anticipated launch of Sora sometime in 2024, the market is set for further disruption.

Countdown to Runway's AI Film Festival

The News: With just a month to go, anticipation builds for Runway's second annual AI Film Festival (AIFF), a event that celebrates the melding of AI technologies with the creative vision of filmmakers. As submissions have now closed, the festival is preparing to showcase ten exceptional finalists from a wealth of entries. These screenings, taking place in NYC and LA, promise not just exposure but also substantial rewards, with over $60,000 in prizes, backed by sponsors like Coca-Cola.

Why It's Relevant for Creative Professionals: The top ten submissions will provide insight into the current applications and future potential of AI in video production, offering a clear view of how creatives are incorporating AI into their work. This festival serves as a platform not just for showcasing the possibilities AI brings to filmmaking but also for Runway to cement its influence in the industry.

The Future of Creative Workflows: Open Interpreter

The News: The recent unveiling of the Open Interpreter 01 Light, a portable voice interface designed to revolutionize our interaction with home computers, marks a significant leap towards the future of AI-enabled creative workflows. With the ability to understand screen content, utilize applications, and acquire new skills, 01 Light is setting the stage for an open-source era where AI devices become more than tools—they become partners in creation. This innovation is grounded in the open-source Open Interpreter OS, providing a robust foundation for the expansive potential of AI collaboration.

Why It's Relevant for Creative Professionals: The concept of engaging with our computers as co-collaborators presents a transformative shift in how creative work is approached. Imagine an environment where the computational power of AI takes on the bulk of editing or animation tasks, allowing creators to focus on refining and iterating based on the AI's output. The introduction of technologies like the 01 Light and Open Interpreter OS brings this vision closer to reality, offering a glimpse into a future where human creativity and AI efficiency merge seamlessly.

This shift towards AI co-collaboration not only promises to enhance productivity but also to redefine the creative process itself. By leveraging open-source AI technologies, creative professionals can customize and extend their digital assistants, making the leap from using AI as a mere tool to engaging with it as a creative partner. This evolution paves the way for unprecedented levels of creative expression and innovation.

Creative Workflow Lab