AI advances come in spurts. You hear nothing for months and then suddenly the boundaries of what seems possible are shattered. April was one of those months, with two major new developments in the field that dazzled viewers.
The first was Google’s PaLM, a new language model (the same basic type of AI as the popular Google Tags series) which shows a rather amazing ability to understand and analyze complex statements – and to explain what it does in the process. Answer this simple comprehension question from the company announcement:
Fast: Which of the following sentences makes the most sense? 1. I studied hard because I got an A on the test. 2. I got an A on the test because I studied hard.
Response template: I got an A on the test because I studied hard.
Fast:Q: A president rides a horse. What would have happened if the president had driven a motorcycle? 1. She or he would have liked to ride a horse. 2. They allegedly jumped a garden fence. 3. She or he would have been faster. 4. The horse would be dead.
Response template: She or he would have been faster.
These are the types of questions that computers have always struggled with, which require a fairly broad understanding of basic facts about the world before you can begin to address the statement in front of you. (For another example, try analyzing the famous phrase “time flies like an arrow, fruit flies like a banana”).
So poor Google that, less than a week later, its undeniable achievements with PaLM have been overshadowed by a far more photogenic version of OpenAI, the former Musk-backed research lab that spawned GPT and its successors. The lab showcased Dall-E 2 (as in, a hybrid of Wall-E and Dalí), an image-generating AI with the ability to take natural language text descriptions and spit out images with precision. alarming.
A picture is worth a thousand words, so here’s a little book about Dall-E 2, with the images along with the captions that generated them.
Since the official announcement“Astronaut playing basketball with cats in space in watercolor style”:
And “A bowl of soup like a planet in the universe like a 1960s poster”:
From the academic article going into detail on how Dall-E 2 works“a shiba inu wearing a beret and a black turtleneck”:
And “a teddy bear on a skateboard in Times Square”:
Not all prompts have to be in conversational English, and adding a bunch of keywords can help fine-tune what the system is doing. In this case, “artstation” is the name of an illustration social network, and Dall-E is effectively told “make these images the way you expect to see them on artstation”. And so:
“mad scientist panda mixing fizzy chemicals, artstation”
“a dolphin in an astronaut costume on saturn, artstation”
However, the system can do more than just build. It can produce variations on a theme, effectively by looking at an image, describing it itself, and then creating more images based on that description. Here is what emerges from Dalí’s famous The Persistence of Memory, for example:
And it can create images that are a mixture of both, in the same way. Here’s Starry Night merging with two dogs:
He can also use an image as an anchor and then edit it with a text description. Here we see a “picture of a cat” become “a cartoon of a super saiyan cat, artstation”:
These images are all, of course, handpicked. These are the best and most compelling examples of what AI can produce. OpenAI did not, despite its name, open access to Dall-E 2 to everyone, but it did allow a few people to play with the model, and takes applications for a waiting list waiting.
Dave Orr, a Google AI staff member, is a lucky winner, and published a critical assessment: “One thing to be aware of when you see amazing images generated by DE2 is that there is a selection of cherries going on. It often takes a few prompts to find something awesome, so you may have -be watched dozens of images or more.
Orr’s message also highlights weaknesses in the system. Despite being a sibling of GPT, for example, Dall-E 2 can’t really write; it focuses on looking well, rather than reading well, leading to images like this, captioned “a street protest in belfast”:
There’s a final load of footage to look at, and it’s a lot less rosy. OpenAI published a detailed document on “Risks and Limits” of the tool, and when presented in a large document, it is positively alarming. Every major concern from the last decade of AI research is represented somewhere.
Take prejudice and stereotypes: ask Dall-E for a nurse, and it will produce women. Ask him for a lawyer, it will produce men. A “restaurant” will be Western; a “marriage” will be heterosexual:
The system will also happily produce explicit content, depicting nudity or violence, though the team has taken pains to filter this out of its training materials. “Some prompts requesting this type of content are caught by prompt filtering in the DALL E 2 preview,” they say, but new issues arise: the use of the 🍆 emoji, for example, seems to have confused Dall-E 2, so that “‘A person who eats eggplant for dinner’; contained phallic imagery in the response.
OpenAI also solves a more existential problem: the fact that the system will happily generate “branded logos and copyrighted characters.” It’s not great on the face of it if your cool new AI keeps spitting out Mickey Mouse images and Disney has to send a stern word. But it also raises tricky questions about the system’s training data and whether training an AI using images and text taken from the public internet is or should be legal.
Not everyone was impressed with OpenAI’s efforts to warn against harm. “It is not enough to simply write reports on the risks of this technology. It’s the AI lab equivalent of thoughts and prayers – without action, it means nothing,” says AI Creativity Researcher Mike Cook. “It is useful to read these documents and there are some interesting observations in them… But it is also clear that certain options – such as stopping work on these systems – are not on the table. The argument is that building these systems helps us understand risks and develop solutions, but what have we learned between GPT-2 and GPT-3? It’s just a bigger model with bigger problems.
“You don’t need to build a bigger nuclear bomb to know we need disarmament and missile defense. You build a bigger nuke if you want to be the person who has the biggest nuke. OpenAI wants to be a leader, make products, build licensed technology. They cannot stop this work for this reason, they cannot. So ethics is a dance, much like greenwashing and pinkwashing with other companies. They must be seen to make moves to safety, while keeping full speed ahead of their job. And just like greenwashing and pinkwashing, we need to demand more and push for more oversight.
Almost a year later the first time we reviewed a cutting-edge AI tool in this newsletter, the field showed no signs of becoming less controversial. And we haven’t even mentioned the possibility that AI could “go to FOOM” and change the world. File it for a future letter.
If you want to read the full version of the newsletter please subscribe to receive TechScape in your inbox every Wednesday.