Let’s Not Flip Sides On IP Maximalism Because Of AI

from the copyright-is-the-wrong-tool dept

Copyright policy is a sticky tricky thing, and there are battles that have been fought for decades among public and corporate interests. Typically, it’s the corporate interests that win — especially the content industry. We’ve seen power, and copyrights, collect among a small group of content companies because of this. But there is one significant win that the public interest has been able to defend all these years: Fair Use.

Fair use’s importance has only grown over the years. Put simply, fair use allows people limited use of copyrighted material without permission. Fair use’s foundations are in commentary, criticism, and parody. However, fair use has arguably filled in important gaps to allow us to basically exist on social media. That’s because there are open questions on what is and isn’t copyright infringement, and things as simple as retweeting or linking could theoretically get us in trouble. Fair use also allows a lot of art to exist, because a lot of art critiques or comments on older art. On the flip side, when fair use was ruled to not cover music sampling it basically killed a lot of creative sampling in hip hop music. Now popular sample-based music is relatively tame and tends to use the same library of samples.

Fair use (probably) also protects the creator industry. Many people make a living streaming video games or making content around playing video games. All of that could violate copyright laws. We don’t know the extent of risk here, because it hasn’t been fully tested, but we do know that videogame makers have claimed videogame streaming content as copyrighted material. We also know that in Japan, which doesn’t have fair use, that a streamer got two years in jail for posting Let’s Play videos. A lot of creators also make “react” content, which also relies on fair use protection.

Blowing up Fair Use

Considering the importance of fair use, and the historically bad behavior of the content industry towards ordinary people, it’s surprising that a lot of public interest advocates want to blow it up to hurt AI companies. This is unfortunate, but not particularly surprising. Content industry lobbying has inflated copyright protections into a pretty big sledgehammer, and when you really want to smash something you often look for a sledgehammer. For example, copyright and right of publicity (a somewhat related state-level IP regime) were the first tools people turned to to protect victims when revenge porn first became a big problem.

Similarly, some public interest advocates are turning to copyright to stop AI from being trained on content without permission. However, that use is almost certainly a fair use (if it’s copyright infringement at all) and that’s a good thing. The ability of people to use computers to analyze content without permission is extremely useful, and it would be bad to weaken or destroy fair use just to stop companies from doing that in a socially problematic way. The best way to stop bad things is with policy purposefully made to address the whole problem. And these uses of copyright law often plays into the hands of powerful interests — the copyright industry would love the chance to turn the public interest advocacy community against itself in order to kill fair use.

I’m not saying that there aren’t issues with AI that need to be addressed, especially worker exploitation. AI art generators can be especially infuriating for artists: they use a lot while giving back little. In fact, these generators are arguably being built to replace artists rather than to provide artists with new tools. It can be attractive to throw anything in the way to slow it down. But copyright, especially copyright maximalism, has done a terrible job of preventing artist exploitation.

Porting “on a computer” to copyright

One of the biggest public interest fights in patent law has been against “on a computer” software patents that clogged up the system and led to a number of patent infringement suits against small businesses for silly claimed inventions. The basics of the problem is this: it was initially allowed to claim an invention in doing something that was already known, but on a computer. These on a computer patents have been greatly restricted through Supreme Court rulings (which special interests would like to overturn). However, the bad effects of software patents still exist today, as do patent trolls seeking to exploit them.

This current fight over copyright in training data reminds of this same problem. For example, if a writer wanted to study romance novels to find out what is popular it would be perfectly acceptable under copyright policy for them to read and analyze a lot of popular romance novels and to use that analysis to take the most successful parts of those novels to create a new novel. It is also perfectly acceptable under copyright law for an artist to study a particular artist and replicate that artists style in their own works. But using an AI to do that analysis, doing it “on a computer,” is now suspect.

This is short sighted for a number of reasons, but one I’d like to highlight is how this shrinking of fair use is difficult to contain. We are talking about an area in which the question of whether loading files into RAM is “copying” under copyright law (and therefore needs permission or is a violation) is an actual policy debate that public interest advocates have to fight. If using content as training data becomes a copyright violation, what’s the limiting principle? What kinds of computer analysis would no longer be protected under fair use?

I should also point out that IP maximalization is the easiest way to build oligopolies. Big companies will be able to figure out how to navigate the maze of rights necessary to build a model, and existing models will likely be grandfathered in (with a few lawsuits to get through). However, it will be impossible for any new company or new open source model to be created. Dealing with rights at scale is a problem so significant that even the rightsholder industry has trouble tracking them. And information about rights has been withheld to leverage better deals due to the risk (and high costs) of accidentally infringing someone’s rights.

Matthew Lane is a public interest advocate in DC focusing on tech and IP policy. This post was originally published to his Substack.

Filed Under: , , ,

Rate this comment as insightful
Rate this comment as funny
You have rated this comment as insightful
You have rated this comment as funny
Flag this comment as abusive/trolling/spam
You have flagged this comment
The first word has already been claimed
The last word has already been claimed
Insightful Lightbulb icon Funny Laughing icon Abusive/trolling/spam Flag icon Insightful badge Lightbulb icon Funny badge Laughing icon Comments icon

Comments on “Let’s Not Flip Sides On IP Maximalism Because Of AI”

Subscribe: RSS Leave a comment
42 Comments
Anonymous Coward says:

Why should it be “fair use” for a corporation worth billions to co-opt the work of creators who never agreed to have their work used to train AI models and who haven’t and won’t be compensated for that use, as the law currently stands?

People like to compare training of an AI model to a human reading and learning from the same work, but these simply aren’t the same thing. For one, AI models aren’t humans, they’re a product being created largely for the profit of corporations. They certainly don’t have the same inherent rights we assign to another human, and we’re a long way off from even considering that possibility.

More importantly, AI models don’t learn like humans. Just look at the recent work where researches have been able to get AI models to disgorge entire sections of the works they’ve been trained on. The very same works we’re supposed to believe explicitly aren’t being copied in whole or in part as these models are trained from them.

Having some respect for creators in light of AI developments absolutely isn’t “blowing up” fair use. I think this is one area where BestNetTech’s take will not age well.

This comment has been deemed insightful by the community.
Anonymous Coward says:

Re:

Fair use does not require compensation. Agreement is never part of the equation in fair use. The whole point of fair use is to be able to use copyrighted materials without the copyright holder’s permission.

No copyright on the art? Then the artist is out of luck!

This comment has been deemed insightful by the community.
Anonymous Coward says:

Re:

Why should it be “fair use” for a corporation worth billions to co-opt the work of creators who never agreed to have their work used to train AI models and who haven’t and won’t be compensated for that use, as the law currently stands?

For the same reason that Students can learn from books etc. that they buy, borrow or see in an art gallery or museum. it is the same reason that music genres exist, people can learn ways of doing things from other peoples works.

Those claiming it is unfair are asking for free money from computer analysis, which is not the same as copying their works for profit.

Further people trying to make a living have to compete with everybody on the planet who publishes their works via the Internet. Indeed the idea that creative works were rare and valuable come from a pre-Internet world, where publishers selected a few works from the many submitted for publication. Those who won that lottery have an overinflated idea of how rare creativity is, and how much their work should be worth.

Anonymous Coward says:

Re: Re:

When I last looked, the books were not free, and coding books have explicit permissions defined for the information and the code included in them. So it’s not that I can go to a bookstore, get the book, use it the way I want and just put it back.

Also, some of the things we publish for free on the internet maybe is free as in beer, but not free as in speech. My blog posts are licensed with CC-BY-NC. You can’t just parse and transform/derive them, and strip the license away. You can only share it as-is, with the license attached, which none of these AI systems do.

Putting the shenanigans of parsing GPL code, removing license and offering its derivations to any party who pays of API access, tons of source-available commercial code repositories are also parsed and offered to unsuspecting people.

Will corporations say that, “oh this is our source-available licensed codebase parsed by an AI and offered to you, so this is fair use. You can use our copyrighted algorithms for free because it’s fair use”?

Haha. Of. Course. NOT.

Corporations just bend the available things in a way to profit themselves, while sucking the public dry. They cry “Fair Use!” because it’s profitable for them, not because it’s a cut and dry case of fair use.

Anonymous Coward says:

Re: Re: Re:

When I last looked, the books were not free, and coding books have explicit permissions defined for the information and the code included in them

Only n that you cannot just copy and paste their example code into a product, but by reading those books you have learnt new things about coding that you can practice, and the authors cannot impose costs on you for practicing what you learn from their books.

This comment has been deemed insightful by the community.
Anonymous Coward says:

Re:

More importantly, AI models don’t learn like humans. Just look at the recent work where researches have been able to get AI models to disgorge entire sections of the works they’ve been trained on.

Are you saying that a human who has read a book can’t reproduce sections from it with some trial and error and the correct prompting? How about a savant with perfect recall?

And I’m a bit interested in what you mean by “entire sections”? Section can refer to a couple of sentences, a paragraph, a page or even a chapter. So how much text could an LLM output with the right prompting that corresponded to a book?

The very same works we’re supposed to believe explicitly aren’t being copied in whole or in part as these models are trained from them.

This tells me you don’t even have an inkling of how LLM’s actually work because the statement above is proof of that you think they “copy” the input.

Arianity says:

Re: Re:

And I’m a bit interested in what you mean by “entire sections”? Section can refer to a couple of sentences, a paragraph, a page or even a chapter. So how much text could an LLM output with the right prompting that corresponded to a book?

Not sure about a book, but LLMs have already been shown to reconstruct images in their training sets:

https://www.usenix.org/system/files/usenixsecurity23-carlini.pdf

https://arxiv.org/pdf/2212.03860.pdf

As far as I’m aware, there’s no real reason you couldn’t do something similar with text. It might be a bit harder, there might be constraints on images that aren’t present for text, which could make it easier to force.

There are things you can do to reduce this, but it is a known problem.

This tells me you don’t even have an inkling of how LLM’s actually work because the statement above is proof of that you think they “copy” the input.

I mean, they kind of are, in a sense. It depends on what you mean by “copy”. The data set is basically used to build a giant parameter space. That parameter space encodes information from that original database (with some tricks to try to avoid falling into a local minima by copying a piece exactly).

It’s not a literal copy as we typically think of it, but the data is encoded in the transformation. Which is why you can do things like pull out something from your training set. Because (a lot of) that information is encoded in the parameter space. It is in some sense a form of lossy copying.

Anonymous Coward says:

Re: Re: Re:

It is in some sense a form of lossy copying.

If it’s lossy copying then that defeats the point of the copying, doesn’t it?

If the lossy copy was being sold or treated as an equivalent of the original, then sure, there might be a point in claiming that it’s some kind of infringement, but it’s not.

Rocky says:

Re: Re: Re:

Not sure about a book, but LLMs have already been shown to reconstruct images in their training sets:

And how much time was spent on creating a prompt to get the desirable output?

It’s one thing to simply ask for Da Vinci’s Madonna and the AI outputs a good facsimile, it’s entirely different if you have to spend time to craft a prompt through trial and error that produces the facsimile.

Anonymous Coward says:

Re: Re: Re:

Not sure about a book, but LLMs have already been shown to reconstruct images in their training sets:

Somewhat debatable, as the examples I have seen are images where there are thousands of similar images available, such as Football images. Ask an AI to create an image of a player kicking a ball, or the moon on the horizon, or a sunset, you should not be surprised if it looks very similar to one or more of the many images of those topics.

This comment has been deemed insightful by the community.
Anonymous Coward says:

Re:

Why should it be “fair use” for a corporation worth billions to co-opt the work of creators who never agreed to have their work used to train AI models and who haven’t and won’t be compensated for that use, as the law currently stands?

The same argument was made against remixes, and… suffice to say, that was not a take that aged well.

Having some respect for creators in light of AI developments absolutely isn’t “blowing up” fair use

You can, in fact, have respect for both creators and fair use. Exceptions to copyright matter to creators more than they initially realize – because imagine what happens if a large corporation worth billions then comes around and sues individual content creators because the corporation thinks that the individual creators’ work might resemble something in their gigantic catalog. Even if the claim was baseless, without exceptions to copyright, how long do you think the creators will last?

Anonymous Coward says:

Re: Re:

because imagine what happens if a large corporation worth billions then comes around and sues individual content creators because the corporation thinks that the individual creators’ work might resemble something in their gigantic catalog.

Please don’t give the RIAA, MPAA and Nintendo ideas.

Anonymous Coward says:

Re: Re: Re:

Please don’t give the RIAA, MPAA and Nintendo ideas.

Don’t need to. They’ve been spending the last couple decades doing the exact same thing.

Which is why content creators need to realize what kind of power they’ll be handing to large corporations if they keep cheering on copyright as their savior against AI.

Anonymous Coward says:

Re: Re:

Even if the claim was baseless, without exceptions to copyright, how long do you think the creators will last?

Here, that’s not even the biggest problem. The problem is that authors and artists are expecting to solve the problem by involving copyright, on things that aren’t covered by copyright to begin with.

An author or artist has no copyright claim over content that they had no personal involvement or decision in creating. Another person mimicking their art or writing style has not committed copyright infringement because styles are not protected by copyright.

Asking for copyright to be involved has the same energy as game devs DMCAing reviews or Milorad Truklja/Thomas Goolnik suing a website mentioning their name.

Diogenes (profile) says:

Re: youre right and youre wrong

“For one, AI models aren’t humans,”

True, but AI are tools that a human is using.
If that human had the right to study a copyrighted work and learn from it then he also has the right to use an AI to study that work and learn from it. The AI has no rights and cannot break the law. It is the human using the AI that the law considers.

Anonymous Coward says:

Re:

If I obtain a copy of an ebook that I believe to be legal (whether purchased or Public Domain), what difference does it make whether I read it myself or use my ereader’s TTS to read it to me? Similarly, if a digital copy of a work is obtained to train AI by people who have no reason to think it’s not legit, what difference does it make whether it’s typed into the computer or directly read by it? In fact, the second position has more legality than the first because no copying actually occurred.

Rocky says:

Re: Re:

Something I said a while back, was that those who complain about AI is using the same reasoning that some people use when they file for a patent for a well known process but on a computer.

A computer that creates AI training sets isn’t using some magical process, a person can also create a training set manually by just using pen and paper plus a bunch of books (they may not complete it within their lifetime though), and everything a person is allowed to do to acquire “reading material” is exactly the same for suppling a training set with data. So many of the complaints about AI boils down to something as simple as “its automated” and this disconnect in the detractors thinking will ultimately make lawsuits hinging on that factor alone fail.

This comment has been flagged by the community. Click here to show it.

This comment has been flagged by the community. Click here to show it.

Anonymous Coward says:

Re:

The difference is that case law has become much more complicated in the ensuing 150 years, and the Copyright Act has been updated multiple times.

As a result, the arguments are all old, but their application to the current copyright landscape is novel enough that people who weren’t alive last time around have a hard time seeing the potential simplicity, and instead want to torture the existing laws some more.

Anonymous Coward says:

Re: Re:

in the ensuing 150 years… the Copyright Act has been updated multiple times.

The most significant change is automatic copyright. It used to be that if you wanted a copyright in your drawing or photograph, you’d have to write “copyright” on it at the very least; maybe even register and pay a fee. Few people did, so most stuff was not copyrighted. In the USA, that didn’t fully change till 1989.

Anonymous Coward says:

Re:

The rich and powerful are not a monolith. They have a shared class interest but the goals of Disney et al are at odds in many ways with the goals of Google et al. The copyright cartels would love to expand restrictions and hamstring tech companies whose business models rely on fair use and safe harbor. In the end, regulations will always benefit the biggest players, and whichever AI company can claw its way to the top will do what it wants, but I don’t think we’re there yet.

Anonymous Coward says:

“It is also perfectly acceptable under copyright law for an artist to study a particular artist and replicate that artists style in their own works. But using an AI to do that analysis, doing it “on a computer,” is now suspect.”

So does that mean I’m in trouble because I copied the style of a research paper when writing my own on a tablet?

Add Your Comment

Your email address will not be published. Required fields are marked *

Have a BestNetTech Account? Sign in now. Want one? Register here

Comment Options:

Make this the or (get credits or sign in to see balance) what's this?

What's this?

BestNetTech community members with BestNetTech Credits can spotlight a comment as either the "First Word" or "Last Word" on a particular comment thread. Credits can be purchased at the BestNetTech Insider Shop »

Follow BestNetTech

BestNetTech Daily Newsletter

Subscribe to Our Newsletter

Get all our posts in your inbox with the BestNetTech Daily Newsletter!

We don’t spam. Read our privacy policy for more info.

Ctrl-Alt-Speech

A weekly news podcast from
Mike Masnick & Ben Whitelaw

Subscribe now to Ctrl-Alt-Speech »
BestNetTech Deals
BestNetTech Insider Discord
The latest chatter on the BestNetTech Insider Discord channel...
Loading...