On the "problem" with AI generated art

Posted on Sep 3, 2022 ( ~9 Minutes / 2379 Words )

There has been a lot of uproar about the “ethics” of AI generated art from tools like MidJourney, Stable Diffusion, and Dall-E. People talking about “theft” and “copyright infringement” and how artists should be paid for “stealing” their styles.

This blog post intends to break down the ridiculousness of those claims with simple logic, and historical counterarguments. I’ll show how the uproar is ultimately just an emotional knee-jerk reaction by people ignorant of the reality of art, illustration, and these AI systems.

For context, I’m speaking as someone who owes their childhood and early adult life to the visual arts. My mother was an artist. She literally lived with, and studied under, Picasso. Her works hang in multiple museums. She taught privately, and at many colleges, including Harvard. I personally spent years as a Graphic Designer. When I wasn’t making brochures for people I was making graphic art for myself. Since then I’ve become a programmer, but I’ve never stopped thinking about art, and I am grateful for what art has done for me, including, but not limited to putting a roof over my head, and feeding me.

The biggest, and most frequent argument is that artists should be paid for the “style”. This is a purely emotional reaction based in ignorance of many, many things.

First, copying the style of other artists is literally the core functionality of an art student. Studying the work of those who have gone before us and succeeded is how we learn anything efficiently. You want to learn how to capture light? Go study Rembrandt. Attempt to replicate his pieces until you intuitively understand the techniques he used. That is to say, learn his stuff until you can make things that look like Rembrandt made them. Want to learn another technique? Go find another artist who’s great at it and repeat. This isn’t juts good advice. This is literally what they instruct you to do at art school. It’s what every generation of artists have done.

At a simpler level, think of all the kids drawing anime style art. They are literally copying the style of people who are good at the art they like. Nobody is yelling at them that they should pay Toei Animation Co., Ltd. for not only “stealing” Sailor Moon’s style. Toei isn’t coming down on them demanding money when they grow up and start making their own Anime or Manga in a similar style.

As I said, the core functionality of an art student is studying other artists, learning their style, and incorporating it into their own experience and expression. This is literally what tools like Midjourney do. They study other art styles, learn from them, and learn to create new things based on what they’ve learned from all the styles.

Why then would this be acceptable practice for a human, but not for a computer? It’s literally the exact same thing. Is it because computers are “better” at it than humans?

Saying they’re “better” at it is bullshit. For generations there have been master forgers who have not only replicated the style of master artists but their paintings down to the brush-strokes. They can do it to such a degree that the “experts” have been fooled. Mostly, they have just replicated the style of another artist meticulously, and sold “newly discoverd” paintings by them. A computer can learn more styles well than the average artist, but I assure you they’re still not “better” at replicating it than a master forger.

Keep in mind that creating a master forgery is 100% legal for a human. If the forger didn’t try to claim it was someone else’s they’d be probably praised for their incredible talent. Alas, artists are paid crap, so they’re better off pretending their work is someone else’s and never receiving credit for their years of practic and incredible skill.

So, again, why is this acceptable practice for a human, but not for a computer?

Artists have used the best technology available to them since the beginning. The pinhole camera came into use around 300BC to trace the world. They added a lens to it in the 1500s and called it a Camera Obscura. In 1806 the Camera Lucida was invented. John W. Audubon (son of the famous Audubon) used it to reduce the size of his fathers folio of prints to a more affordable size. Literally used to copy the style, color, and everything else to the most exacting details possible at the time. None of these technologies are illegal, and no-one is yelling that they should be. If the argument is that technology shouldn’t be allowed to facilitate the replication of someone else’s style then surely these are ever more forbidden, because they are made specifically to replicate the exact thing you’re looking at.

The most “cogent” argument I’ve heard for why this shouldn’t be allowed is that illustrators will go out of business. This is exactly the same argument illustrators used when we started being able to print photographs. Guess what, we still have illustrators. “Ahh but this is different! because…. computers!” Just. No. Anyone who claims this doesn’t know what they’re talking about. They don’t know what it means to be a professional illustrator, and they don’t know what modern AIs are actually capable of.

An illustrator is paid to convey a specific idea. They’re paid to interpret what the client is asking for, and translate that through their style or a style of the client’s choosing, into a piece of art that reflects what they asked for. Yes, clients dictate style all the time. What the client rarely considers is that they’re also hiring the artist for their experience both in knowing what kind of things work to convey the idea visuall, but in what types of things the target audience will find acceptable.

As a general statement, clients have no fucking clue what they need, only a vague sense of what they want, and frequently ignore the advice of the expert they’ve hired to produce a useful illustration, before trying to pay them too little for their services.

So, let’s take a look a the current state of the art. It’s not clear to me who generated the following but you can find the originals here.

Let’s look at what they asked for.

Gigantic extraterrestrial futuristic alien ship landed on the kingdom of julius caesar, roman historic works in brand new condition, not ruins, hyper-detailed, artstation trending, world renowned artists, historic artworks society, antique renewel, good contrast, realistic color, cgsociety, by greg rutkowski, gustave dore, deviantart

And here’s what they got

four images mostly showing severly degraded sandstone-like structures that may have once been architecture. the last shows 2 ruins of things like the roman colleseum

Things they asked for but didn’t get:

an alien ship (giant or otherwise)
roman historic works in brand new condition (frequently no roman works at all)
not ruins
something that looked like it was made by “gustave dore”
something that looked like it was made by “greg rutkowski”
realistic color

So, the digital illustrator has utterly failed to capture the design in every meaningful way.

It did, however, produce cool images. This is not even remotely atypical.

The thing is, describing an image is exceedingly difficult. If a picture is worth “a thousand words” it’s only because we have enough shared context to understand all the implied things that are left out of those thousand. To describe an image to a computer in sufficient detail as to actually get what you’re imagining would require tens of thousands of words, because computers don’t have an understanding of what it is to be human. Furthermore, they don’t understand language… Like, at all. They’re exceptionally bad at extracting meaning. Witness the dumpster-fire of the results above. Witness the sentence “dumpster-fire” and how it doesn’t actually mean that there is a dumpster, on, or with a fire in that context.

In my experience with Midjourney there are a few things you can get it to do well. Very, very, few of those things are actually directed in any meaningful way.

For example, some friends of mine like turtles. I decided to try and make them a cool racing turtle. I started by asking for a turtle with exhaust pipes getting a pit stop. I got this.

four images of vaguely turtle shaped things with smoke around them, some have weird tubes coming out, some have wheels

I then tried to get a turtle that was either racing, or in a racing car. I went with a more cartooney style because it didn’t understand what a turtle was at all, and after generating nearly 400 individual images this was the best I could do.

a smiling maybe-turtle drives a very cartoonish vehicle. up close there is something very wrong with its face

I was however, able to get a bunny in a racing car. Midjourney is oddly good at bunnies.

a huge, but cute, semi-realistic bunny head sits out of the drivers compartment of a cartoonish racing car with matte colors in the background hinting at buildings

Overall, Midjourney is incredible. It can do amazing things, and you’ve probably seen them. For example, here are some of the things from the community feed as I write this.

a screenshot of some of the beautiful items on the midjourney feed

What you’re not seeing is that the amazing images being shown off are the cherry-picked best-of-the-best. Your not seeing the countless horrible, distorted, inhuman attempts that it took to make them. You’re not seeing the fact that basically none of these are what the person asked for. You’re not seeing that if you take the exact phrase they used, you will get something radically different.

For example. In the composite image below, the large image on the left is something a community member called CannedGoods generated. The original is here if you’re a Midjourney user.

To the right, is what Midjourney emitted when I entered the exact same phrase they used.

an advanced world with ethereal temples suspended in fractal clouds echoing the past, high detail, intricate Fibonacci design, epic emotional detailed cyberpunk sci-fi fantasy pre-Raphaelite art

5 images. to the left is some sort of tower with a circular top. you can see the light glowing around the circle and shining through its center. someone sits at its base. to the right of that are four images, mostly of stylized cities floating in or above clouds

As with the prior example, what was generated isn’t what they asked for. Mine is closer to the brief, but there still aren’t “fractal cloud”, it’s not “high detail”, and it doesn’t involve “Fibonacci” in any obvious way. What’s important though is that there is nothing replicate-able here. What’s generated is almost entirely random. As a general statement, you can’t get what you want.

The problem of getting what you want is so bad that a business called PromptBase has cropped up to sell collections of keywords that will generate a particular style on a particular AI. It’s not going to help you get the content you asked for, but you will help you guide it to a particular style.

At this point, someone’s probably yelling about how this is what they’re talking about. They’re yelling that the artists are getting screwed, and other people are profiting from it: the AI systems, and now these people selling things to use the AI systems. This is where I have to remind them that the artists were never profiting from it. The fact that there’s a computer copying their style is irrelevant. Style isn’t, and shouldn’t be, copyrightable.

So, let’s address that. An artists “style” is even harder to define than “obscenity” which infamously boils down to “we’ll know it when we see it.” The idea of copyrighting “style” would be a legal nightmare that, if implemented, would functionally destroy the visual arts industry. “Style” could be defined so broadly, and an artist could have so many “styles” that you’d end up with a few people owning everything and no-one else ever being able to sell anything. You think patent-trolls are bad? Just wait. But, it also has the same problem. Just about any “style” you try and claim as your own has prior art. New “styles” are rare. Practically, even some brain-damaged idiot did decide to save the patient buy cutting of it’s head, it’s not copyrightable now.

And, while I’m at it, why is no-one bitching about the “theft” of written style? AI has been doing that for years and no-one gives a damn. As with visual artists, writers have been doing it since the beginning of writing.

But let’s get back to the generation of images. As I said, it’s terrible at giving you exactly what you want, unless what you want is to wander down a rabbit hole of imagination and see where you end up. The people in the Midjourney discord are celebrating these pieces of uncontrolled emergent beauty.

For example. Midjourney went down for a few minutes while I was writing this. Today’s commnity theme is “echo”, so i created this: “echoes of a time when midjourney was down and the people found themselves with lost and wandering”. Each of these is a new piece of art, synthesized from the study of thousands of artists, processed through a system divorced from any real linguistic understanding.

These are beautiful unique, heretofore unseen pieces of art. And they’re just the starting point. Each one is a road waiting to be explored. With some skill, you might even be able to guide them.

To summarize:

Copying style is how artist learn to do what they do. No-one “owns” their style, and attempting to make style legally copyrightable would destroy the industry. The best available technology has always been used to copy the style of others. It always will be.

There is no way in hell that illustrators are going to go out of business as a result of AIs like Midjourney, or Dall-E, or Stable Diffusion. The systems simply aren’t capable of giving you what you ask for. If you think that’s “just around the corner”, you are showing your ignorance about how hard the computational problem of language comprehension is, and how incredibly important lived experience is to providing context even when you know the meaning of the individual words.

If you are honestly concerned about the fate of illustration and want to do something to improve it, pay an illustrator to illustrate. Do not complain at their prices. Do not haggle, because you don’t think it’s “worth that much”. You don’t need to have some business reason to commission them. If you like an illustrator’s work, pay them money to make you some.

If you are an illustrator, and still fear for your job after reading this, drop me a note, and we can discuss it. If you’re not an illustrator, and you’re not paying illustrators for their art, then please shut up. Your attempts to help are ignorant, and if you succeed you will screw all the arts massively. It’s hard enough to make a living as an artist without your “help”.