Meet DALL-E, AI, which draws everything on your command

SAN FRANCISCO – At OpenAI, one of the world’s most ambitious artificial intelligence labs, researchers are building technology that allows you to create digital images simply by describing what you want to see.

It is called DALL-E in homage to both “WALL-E”, a 2008 animated film about an autonomous robot, and Salvador Dali, a surrealist artist.

OpenAI, backed by $ 1 billion in funding from Microsoft, still doesn’t share the technology with the general public. But recently in the afternoon, Alex Nickel, one of the researchers behind the system, demonstrated how it works.

When he asked for an “avocado-shaped teapot,” writing these words on an almost blank computer screen, the system created 10 different images of a dark green teapot with avocados, some pitted and some not. “DALL-E is good at avocados,” said Mr. Nickel.

When he wrote “cats playing chess”, he placed two fluffy kittens on either side of a checkered game board, 32 chess pieces arranged between them. When he shouted “a teddy bear playing a trumpet underwater,” one image showed small air bubbles rising from the end of the bear’s trumpet to the surface of the water.

DALL-E can also edit photos. When Mr. Nickel wiped the teddy bear’s trumpet and asked for a guitar instead, a guitar appeared between his hairy hands.

A team of seven researchers spent two years developing the technology that OpenAI plans to eventually offer as a tool for people as graphic artists, providing new shortcuts and new ideas as they create and edit digital images. Computer programmers are already using Copilot, a tool based on similar OpenAI technology, to generate snippets of software code.

But for many experts, DALL-E is worrying. As this type of technology continues to improve, they say, it could help spread misinformation on the Internet, fueling the type of online campaigns that may have helped influence the 2016 presidential election.

“You can use it for good things, but you could certainly use it for any other crazy, disturbing application, and that includes profound forgeries,” such as misleading photos and videos, said Subarao Kampampati, a professor of computer science in Arizona. University.

Half a decade ago, the world’s leading artificial intelligence laboratories built systems that could identify objects in digital images and even generate images on their own, including flowers, dogs, cars and faces. A few years later, they built systems that could do almost the same thing as written language, summarizing articles, answering questions, generating tweets, and even writing blog posts.

Now researchers are combining these technologies to create new forms of AI DALL-E is a remarkable step forward because it juggles both language and images and in some cases understands the connection between the two.

“We can now use multiple, intersecting streams of information to create ever-improving technologies,” said Oren Ezioni, chief executive of the Allen Institute for Artificial Intelligence, an artificial intelligence laboratory in Seattle.

The technology is not perfect. When Mr. Nicole asked DALL-E to “place the Eiffel Tower on the moon,” he did not fully grasp the idea. He placed the moon in the sky above the tower. When he asked for a “living room full of sand,” a scene appeared that looked more like a construction site than a living room.

But when Mr. Nickel changed his demands a little, adding or subtracting a few words here or there, he gave what he wanted. When he asked for a “piano in a living room full of sand,” the image looked more like a beach in the living room.

DALL-E is what artificial intelligence researchers call a neural network, which is a mathematical system freely modeled on a network of neurons in the brain. It’s the same technology that recognizes commands spoken on smartphones and identifies the presence of pedestrians as self-driving cars drive through city streets.

The neural network learns skills by analyzing large amounts of data. By accurately identifying patterns in thousands of photos of avocados, for example, one can learn to recognize avocados. DALL-E searches for models while analyzing millions of digital images, as well as text captions that describe what each image depicts. In this way he learns to recognize the connections between images and words.

When someone describes an image for DALL-E, it generates a set of key features that that image may include. One feature may be the trumpet edge line. Another may be the curve at the top of a teddy bear’s ear.

A second neural network, called a diffusion model, then creates the image and generates the pixels needed to implement these functions. The latest version of DALL-E, unveiled on Wednesday with a new research paper describing the system, generates high-resolution images that in many cases look like photos.

Although DALL-E often fails to understand what someone has described and sometimes damages the image it produces, OpenAI continues to improve technology. Researchers can often improve neural network skills by providing even more data.

They can also build more powerful systems by applying the same concepts to new types of data. The Allen Institute recently created a system that can analyze audio as well as images and text. After analyzing millions of YouTube videos, including audio recordings and captions, he learned to identify specific moments in TV shows or movies, such as a barking dog or a closing door.

Experts believe that researchers will continue to improve such systems. Ultimately, these systems could help companies improve search engines, digital assistants and other common technologies, as well as automate new tasks for graphic artists, programmers and other professionals.

But there are warnings about this potential. Artificial intelligence systems can show bias towards women and people of color, in part because they learn their skills from vast arrays of online text, images, and other data that show bias. They can be used to generate pornography, hate speech and other offensive material. And many experts believe that technology will eventually make it so easy to create misinformation that people will have to be skeptical of almost anything they see online.

“We can falsify text. We can put text in someone’s voice. And we can falsify images and videos, “said Dr. Ezioni. “There is already misinformation online, but the concern is that this misinformation is scaling to new levels.

OpenAI holds the DALL-E strap firmly. This will not allow outsiders to use the system alone. It places a watermark in the corner of each image it generates. And although the lab plans to open the tester system this week, the group will be small.

The system also includes filters that prevent users from generating what they consider inappropriate. When asked about a “pig with a sheep’s head”, he refused to create an image. According to the lab, the combination of the words “pig” and “head” most likely triggered the anti-harassment filters of OpenAI.

“This is not a product,” said Mira Murati, head of research at OpenAI. “The idea is to understand the possibilities and limitations and give us the opportunity to build mitigation.”

OpenAI can control the behavior of the system in some ways. But others around the world may soon create similar technology that puts the same powers in the hands of almost everyone. Working on a research paper describing an early version of DALL-E, Boris Daima, a freelance researcher in Houston, has already built and released a simpler version of the technology.

“People need to know that the images they see may not be real,” he said.

Leave a Comment

Your email address will not be published.