OpenAI Sora Stunned the AI World
OpenAI’s first AI video model, SORA, has stunned the world, effectively killing the whole video content industry.
OpenAI’s first AI video model has stunned the world
OpenAI has just released its first AI video model, Sora, which generates stunning video in 60 seconds with a single shot. Netizens are totally amazed by this revolution in AI video. This AI model understands and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction.
SORA can effectively killing the whole video content industry
Sora is superb. It not only create scenes that are both realistic and imaginative based on text prompt, but also able to generate videos up to 1 minute long in a single shot. This is simply amazing.
While Runway Gen 2, Pika, and other AI video tools are still struggling to break through the video generation of more than a few seconds, OpenAI has already scored an epic achievement.
Sora Capability
Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.
The model has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions. Sora can also create multiple shots within a single generated video that accurately persist characters and visual style.
Examples of Video generated by Sora
In the 60-second video on OpenAI website, the girl and background characters in the video all achieved astonishing consistency. All the characters maintain superb consistency even the cameras are freely switched.
How did OpenAI do it? According to OpenAI official website, “By giving the model foresight of many frames at a time, we’ve solved a challenging problem of making sure a subject stays the same even when it goes out of view temporarily.”
Sora Makes Breakthrough in Various Technologies
With a deep understanding of natural language, Sora can accurately understand the requirement expressed in user instructions and grasp the way these elements are expressed in the real physical world. As such, the characters created by Sora will have rich emotional expression.
In the generated complex video scenes, you will notice that many characters and objects are being created, each with some very specific set of actions. Moreover, you can also find that Sora has successfully reproduce the objects as well as the background’s detailing accurately.
Look at the pupils, eyelashes, and skin texture of the characters in the video below, there are so real until you can hardly find any trace of artificial intelligence.
So guys, what is the difference between virtual object in the video and real world object?
In addition, Sora can generate few different camera angle shots in the same video while maintaining the consistency of characters and styles.
Prior to Sora, all AI videos are generated using a single camera angle POV.
Therefore, it is really unbelievable that Sora is capable to produce video with different shot angles withing a single video that looks real! This is something that Gen 2 and Pika cannot achieve at all…
For example, look at the video below.
With the text prompt, Sora is producing a winter scene in a bustling Tokyo street from a drone’s POV. You can hardly tell this video is produced by an AI tool, because it is so real.
The drone camera track a couple who walk leisurely on the Tokyo street, with the noise/sound of vehicles that running on the riverside road on the left and the sight of customers shuttling between a row of small shops on the right.
From here, you can see that Sora has advanced to a terrifying stage, completely breaking away from the era of virtual video with strong AI trace. There is no other AI video tool that can match Sora’s capability now.
Is virtual world model coming true?
One question that render in my mind now is whether the theoretical virtual world model that mimic our physical world really exists now?
It is terrifying to see what Sora can do now in recreating the real world with just a Machine Learning algorithm. Sora has successfully learned many physical laws that exist in our real world, although it is not 100% accurate now.
Observe the video below. Look at the dogs in the following video. Sora recreates the characters (dogs and snow) perfectly with the correct dogs action (playing) and the snow action that splash and then drop back due to gravitation in real world.
Another example which is created by Sora as well with the following prompt: “Animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle.”
This is simply impressive!
With the prompt, Sora created a creature similar to a Pixar work, which seems to have the DNA of Furby, Gremlin, and Sully from “Monsters, Inc.”
What is shocking is that Sora’s understanding of the physical properties of fur texture is so accurate that it is jaw-dropping!
Back then, when “Monsters, Inc.” was released, Pixar had to spend a lot of effort to create super-complex fur textures when the monsters moves, and the technical team has to work hard for easily several months.
However, Sora achieved this with a piece of cake, no one had taught it before!
“It has learned about 3D geometry and objects’ consistency” the project research scientist Tim Brooks said.
“This is not what we set in advance – it learned naturally by observing a large amount of data.”
Sora can kill the film making industry?
Thanks to the diffusion model used by DALL·E 3 and the Transformer engine of GPT-4, Sora can not only generate videos that meet user’s specific requirements, but also showcase it’s ability to understand and implement the film shooting methodology and workflow.
This ability is reflected in its unique storytelling ability.
For example, in a video with the theme of “A gorgeously rendered papercraft world of a coral reef, rife with colorful fish and sea creatures”, project researcher Bill Peebles pointed out that Sora successfully advanced the story through its camera angles and video timeline.
“Actually, there were multiple camera changes in the video-these shots were not stitched together in post-production, but were generated by the model in one go,” he explained. “We didn’t specifically instruct it to do this, but it can do it automatically.” he said.
Fortunately, Sora is not perfect yet!
However, the current model is not perfect. It may encounter problems in simulating the physical effects of complex scenes, and may also have difficulty in accurately understanding the cause and effect relationships in specific situations. For example, after someone eats a part of a cookie, the cookie may still appear intact.
In addition, the model may make mistakes in handling spatial details, such as distinguishing between left and right, and may also perform inaccurately when describing events that change over time, such as specific camera motion trajectories.
Fortunately, it is not perfect yet. Isn’t it?
Otherwise, can the boundary between virtual and reality still be distinguished?
Look at the following video, can you tell is it real?
But the undeniable fact is that the terrible reality is already in front of us: a model that can understand and simulate our physical real world means that AGI is not far away.
In a few word: SORA is a revolution!
**Videos in this articles are courtesy from OpenAI website.