What is Sora, OpenAI’s Video Generator?

On February 15, 2024, OpenAI unveiled an AI model that captivated audiences with a tantalizing glimpse of its capabilities through a series of high-definition video snippets. Among these visuals were scenes ranging from an SUV gracefully navigating a mountainous terrain to an endearing animation featuring a "short fluffy monster" next to a flickering candle. This preview showcased the model’s ability to craft lifelike visuals across diverse landscapes and narratives, leaving the public intrigued and eager for more.

Three years ago, OpenAI released DALLE-2, a text-to-image model that generates images from the users’ inputs called “prompts.” Two years ago, ChatGPT swiftly gained popularity with its ability to generate answers and provide insights to a wide range of topics. Sora, the unreleased AI model, can generate videos up to one minute long.

“Prompt: A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.”

Above is an image from a video Sora generated from a prompt. In the video, people can see that Sora can simulate the physical world in motion. Currently, the model can create complex scenes with accurate details and multiple characters. The model understands language, so it can accurately represent user prompts, offering both realistic and imaginative outputs. Sora utilizes diffusion models along with the transformer architecture designed explicitly for generating videos.

Apart from text-to-video generation, according to Open AI, Sora can also animate images. It can extend, connect, and edit videos, and generate images. Moreover, Sora’s capabilities cover simulation of figures, animals, and real-world environments. The model can generate videos with dynamic camera motion. However, Sora is still in development and struggles with simulating physics in complicated scenes. The model is trained with the knowledge of visual artists, designers and other specialists. The goal is to improve the model to be effective and helpful for creative professionals. The model’s progress was published early to showcase its capabilities and receive feedback from other users.

Sora is expected to launch later this year and take video production to the next level. Sora is a much cheaper option for animation and has a great potential to become popular among students and other amateur video creators. Without the complex steps in traditional video-editing, Sora opens doors for more enthusiasts to step foot into the world of video production.

BY EMMA HE