The greatest artistic tool ever built, or a harbinger of doom for entire creative industries? OpenAI’s second-generation DALL-E 2 system is slowly opening up to the public, and its text-based image generation and editing capabilities are impressive.
The pace of progress in AI-powered text-to-image generation is truly frightening. The Generative Adversarial Network, or GAN, first appeared in 2014, promoting the idea of two AIs competing with each other, both “trained” by showing large numbers of real images , labeled to help algorithms learn what they are. to look at. A “generating” AI then starts creating images, and a “discriminating” AI tries to guess whether they are real images or AI creations.
At first, they are evenly matched, both being absolutely crap at their jobs. But they learn; the generator is rewarded if it tricks the discriminator, and the discriminator is rewarded if it correctly selects the origin of an image. Over millions and billions of iterations – each taking a few seconds – they improve to the point where humans start having trouble telling the difference.
They learn in their own way, completely undirected by their programmers; each AI develops its own understanding of what a horse is, completely detached from the reality we understand. All he knows or cares about is his job: to either fool the other AI or not get fooled, based on his own individual and completely mysterious methods of analyzing and creating data.
This leads to the famously weird disconnects from reality that have been a hallmark of these systems to this day. Think Deepdream’s bizarre obsession with dogs and eyesor the unbridled and beautiful surrealism of systems like Botto, the AI/human art collaboration NFT.
So far, these algorithms have been fascinating entertainment. DALL-E 2, on the other hand, clearly shows how disruptive this technology will be – not five or 10 years from now, but the minute its doors open to the public. Just watch the video below and imagine how much time and money you would need to budget to do this using non-artificial intelligence.
DALL-E 2 represents a step change in AI image generation technology. It understands natural language prompts far better than anything before it, allowing an unprecedented level of control over subjects, styles, techniques, angles, backgrounds, locations, actions, attributes and concepts – and it generates images of extraordinary quality. If you tell him you want photo-realism, for example, he’ll happily let you direct his lens and aperture choices.
With a high-quality prompt, it will generate dozens of options for you in seconds, each at a level of quality that would take a human photographer, painter, digital artist, or illustrator hours to produce. It’s kind of an art director’s dream; an assortment of visual ideas in an instant, without having to pay for creatives, templates or localization fees.
You can also generate different versions – either versions of something DALL-E generated for you, or something you downloaded. He will create his own understanding of the subject, composition, style, color palette and conceptual meaning of the image, and generate a series of original pieces that echo the look, feel and content from the original, but each adds its own twist.
And DALL-E 2 can now also perform edits, in a way that makes Adobe’s incredibly powerful but notoriously inaccessible Photoshop software feel like a relic of the past. No level of education is required. You can paint a splatter on a chair and say “put a cat over there”. You can tell DALL-E to “make the sun go down”, “put her in a neon-lit cyberpunk atrium”, or “get the bike off”. It includes things like reflections and will update them accordingly.
You can paste an image and have the AI expand it to a larger view frame. Each time it will give you a few different options, and if you don’t like them you can just run the same statement again or be more specific in your prompt. Indeed, you can keep zooming in on an image indefinitely, and people are already using it to amazing creative effect.
These capabilities – which only scratch the surface of what it can do – make DALL-E 2 an absolutely revolutionary image editor. Looks like this technology can do just about anything.
Well, within limits. OpenAI designed DALL-E 2 to refuse to create images of celebrities or public figures. It also doesn’t accept uploads of images “containing realistic faces”, and it tries its best not to generate images of real people, but rather changes things up in an interesting way that tends to look a bit like the real person, but also very clearly not. Be warned, given how sophisticated deepfake and image-editing software is, we can’t imagine it would take a ton of effort to take a DALL-E image and stick your chosen head on it.
The system will not generate pornographic, gore or political content – and indeed the data used to form it excludes these types of images. And, unless you specify racial or demographic information in your prompts, the system “generates images of people that more accurately reflect the diversity of the world’s population”, in hopes of anticipating some of the racial biases that AI systems frequently suffer from biased training data.
DALL-E 2 is currently in beta, with a waiting list for interested parties. Over the next few weeks, one million accounts will be welcomed, each with 50 free credits to use the system and an additional 15 credits each month. Additional credits will cost $15 for 115 credits – and each credit will get you four images for a prompt or instruction. It’s both an incredible democratization of visual creativity and a knife to the heart of anyone who has spent years or decades honing their artistic techniques in the hope of making a living from them.
OpenAI explicitly says that users “obtain all rights to commercialize the images they create with DALL-E, including the right to reprint, sell, and commercialize.” But there are still a few fascinating legal gray areas yet to be fully explored here, given that everything these AIs know about art they learned by analyzing the works of other human creators.
While this latest software looks amazing, it’s worth remembering that it’s still a very early version of this kind of technology. DALL-E 2, its contemporaries and its descendants will continue their evolution at a frantic pace which will only accelerate.
Where to go from here? Well, how about a video? As processing power and storage continue to increase, it’s easy to imagine that systems like this should also be able to generate moving images. Adobe’s AI-enhanced video editing capabilities are already built into its pro-level After Effects software, but we’ve yet to see DALL-E-style creativity in video.
How long will it be before you see an entire short film, written, directed, recorded and entirely directed by AI systems? And then, after that point, how long until they start to be worth watching?
What about other forms of graphic design? Can DALL-E create logos? Website templates? Business cards? Will it scale to automatically generate catalogs, posters, brochures, book covers, and whatever else a designer currently makes a living out of? Most likely. Indeed, if you’re young and interested in art or design, you’d probably better become an expert to get the most out of these emerging tools, because in a few years, whether you like it or not, it could be what the gig looks like.
Presumably, alternative AI image generators will soon start appearing without the ethical and moral boundaries that OpenAI has drawn around DALL-E. The boxes of worms will be opened. The noses will be disarticulated. DALL-E hints at a fundamentally different future, and this kind of upheaval is never painless.
Watch a short video below.
SLAB 2 explained
Source: Open AI