Creating a Viral Video: Cat Drinking Beer with Runway’s AI Video Generator

July 25, 2024

Runway recently introduced its latest text-to-video synthesis model, Gen-3 Alpha, which has the ability to transform written descriptions, or prompts, into high-definition video clips without sound. This new technology has been put to the test, and the results are quite intriguing. It has been observed that while precise prompting is not essential, aligning with concepts present in the training data is crucial for successful outcomes. Additionally, creating entertaining results may necessitate multiple iterations and careful selection.

One common theme among generative AI models since 2022 is their proficiency in blending concepts from the training data but their limitation in generalizing to new scenarios. This means that while they can excel in offering stylistic and thematic originality, they may struggle with structural novelty beyond the data they have been trained on.

For instance, when using Runway Gen-3, if you request a sailing ship in a swirling cup of coffee, the model can likely generate this combination convincingly if it has been exposed to videos of sailing ships and swirling coffee during training. However, if you ask for a cat drinking a can of beer in a beer commercial, it may not be as successful due to the lack of videos featuring photorealistic cats consuming human beverages in the training data. In such cases, the model will combine its knowledge of cat videos and beer commercials, resulting in a cat with human hands enjoying a beer.

During the testing phase of Gen-3 Alpha, we opted for Runway’s Standard plan, which offers 625 credits for $15 per month, along with some bonus trial credits. Each generation of video costs 10 credits per second, and we produced 10-second videos for 100 credits each. Due to limited credits, we could only create a restricted number of generations.

Our initial prompts included scenarios such as cats drinking beer, barbarians with CRT TV sets, and queens of the universe, as well as a nod to Ars Technica with the “moonshark.” The results of these prompts, along with additional prompts, are showcased below.

Despite our limited credits preventing us from refining and selecting the best outcomes, each prompt generated a single result from Runway, providing a glimpse into the model’s capabilities.

From a highly-intelligent individual reading “Ars Technica” on a computer before the screen explodes to a commercial featuring a new flaming cheeseburger from McDonald’s, the range of prompts tested was diverse and imaginative. Whether it was the moonshark leaping out of a computer screen or a cat in a car enjoying a can of beer for a beer commercial, each prompt offered a unique and sometimes humorous outcome.

Overall, the Gen-3 Alpha testing phase demonstrated the potential of Runway’s text-to-video synthesis model in creating novel and visually engaging content. Despite its limitations in generalization, the model showcased its ability to merge various concepts from its training data, offering a glimpse into the possibilities of AI-generated video content creation.