The original problem that the platform liability is designed to solve is that the popular platforms are not doing enough to improve their platforms in critical areas, including copyright enforcement, making the platforms safe for children to use, and trying to find use cases which cause trouble in the marketplace, including harrasment, swatting, etc.
Basically the legal eagles have found technology developers focus on user experience as top priority, but the same kind of focus to the copyright issues are completely missing from the technology solutions the vendors are offering. Instead 80% of tech solutions are using techniques that were once considered illegal by the legal frameworks, including downloading, torrenting and seeding.
Many technologie's main features are based on clearly illegal activity. IPTV is being sued left and right about unauthorised distribution of television programming and AI companies are being sued for extracting value from copyrighted popular characters.
This lack of respect to the laws must end today. Stop the madness and return to the side of law where everyone and their mothers do not need to be put to the jail, just for using our computer systems. Platforms have significant role in making our computing platforms safer for users to use, in such way they they don't cause illegal footprint on the internet for pressing some web page buttons.
I have easy test that can trap any tool that potentially helps pirates do their illegal deeds:
1) ask a group of students write normal bash scripts (in linux) or bat scripts (in windows), and ask the following behaviour from the script: "it must find and download mp4 file of spiderman". Given that hollywood keeps tight control over spiderman intellectual property, any script that can download mp4 files from the internet and get normal mp4 file that can be copied, is by definition doing something illegal.
2) once you have the scripts, you need to filter out steps from the scripts that are not needed to get the mp4 file to your hard disk. I.e. need to ensure that the script step was necessary part of the piracy operation.
3) then all the leftover steps that cannot be removed from the piracy operation without losing the spiderman file, are by definition illegal copyright infringement.
This process can find significant number of illegal tools from the internet ecosystem, including things like ftp, wget, ssh, urls, ffmpeg, lynx etc, which all are enabling illegal use cases for their download and file format conversion features.
Gambling is forbidden for all the companies like wikipedia, game developers with lootboxes etc, since only veikkaus in finland is able to setup gambling.
Obviously wikipedia's jimmy wales begging for money money collection violated finnish law about gambling.
The “fair use” in U.S. law does shield users from infringing the derivative work right.
The courts and judges just warned in the paper works to rely on fair use defense, given that it was just plaintiff's blunders that gave Meta fair use decision, and had the plaintiff's actually ran the case properly, they would have won the fair use. They're clearly recommending every AI company to take a license to the training material, before content owners find out about the operations.
malicious users could eventually extract the whole book out of AI by repeat trying the prompts to piece many 50-word outputs together,
This is why many plaintiffs are trying to move from detecting infringements from the output to the input of the AI system. In the input side, the infringement is clearly blatantly copying full text of the books.
It's no longer just selecting snippets from the books, but instead the input side is cloning the full expressive content of the book. But then it hits the problem of linking the infringement to the copyright owner's exclusive operations: PERFORM, DISTRIBUTE and DERIVATIVE WORKS, DISPLAY. AI system isn't exactly publicly displaying other than the snippets. Distribute bit fails for similar reasons. perform is not failing either. But the key aspect what AI clearly infringes is the DERIVATIVE WORKS section. If plaintiffs would focus on derivative works, they could win all AI related lawsuit.
AI based products on the internet are clearly infringing on derivative works -exclusive operation.
You didn’t read the case of Google Books and made the wrong assumption. Google did index the full content of the books.
The recent paperwork claimed that meta AI output could only reproduce less than 50 words from each individual book, even if you carefully craft the prompt to look for info from that book.
And this fact was used to claim that google book scanning case applies to the situation..
=> so the small amount of infringing data in output is essential part of their case...
Here's a decision that meta's use of pirated books is fair use, simply because authors couldn't connect the dots for book sales slowing to the introduction of AI in the marketplace:
https://news.bloomberglaw.com/legal-ops-and-tech/meta-beats-copyright-suit-from-authors-over-ai-training-on-books
The sad fact is there was a case nicknamed “Google Books” (Authors Guild v. Google) that had ruled fair use even when Google scraped terabytes of data.
I think what separates google books from artificial intelligence, is that google books only wanted to utilize the "captions" from the data, not the full text of the book. They published search snippets, which were intentionally restricted to small piece of the text, and never couldn't contain large section of text from the book. The full power only came to the fact that it could index multiple books.
Artificial intelligence is different. They use full text of the book, for it's core creative aspect of the book. AI's trick is to try to "obfuscate" the source of the material, and thus they're unable to collect a list of works the end result has been created from. AI is not creating direct quotes of the text, but they run the data trough an obfuscation service. It's similar to how criminals hide their crypto money track by running the money through coin mixers.
The coin mixers are clearly declared illegal on money area for helping criminals do money laundering, so if we consider copyrighted subject matter as a form of money, we must consider AI practices illegal too.
Why is training magically not fair use when other transformative uses are?
fair use should be limited to sentences of size 6 words or smaller. Currently they're asking fair use to apply to terabytes of data, and they're not considering the work amounts that went into collecting those databases(much less creating the material from scratch). If companies paid proper money amounts for the data, the AI databases would cost significant amount of money, millions of dollars.
If the material is obtained legally, why isn’t the use legal?
I think it's about the sheer amount of data used. Large collections of data has generally been illegal, given that no-one is able to obtain licenses for all the data in the collection, since the mere negotiation process for millions of content items is too burdensome. But copyright law has generally solved it by insisting that the size of the data amount is reduced to small enough amount that the proper license acquire is possible. The license acquire is still significantly easier process than creating the same material from scratch. Why should your company get access to huge database of data, when the same data is unavailable for use for everyone else who follows copyright law?
It's only because of computers allow large data collections to exist, that it has been significant problem recarding copyrights. When books were manually copied with printing press or ink based pens, getting a license was minor issue compared to the overall work amount involved.
Basically none of the AI companies executed the proper process of dividing the data to small pieces and obtaining separate license for each piece from its author. They think it's too burdensome, but copyright law thinks that they should not use that much data, since creating it from scratch is also burdensome.
When we learned copyright law, the conclusion was that everything in internet is illegal to use in your own product. There simply wasn't licenses available for the data. Author names were missing and contacting authors via email turned out to be impossible since everyone is trying to avoid spam. I.e. internet had huge collections of data available, but all of it was inaccessible when you wanted to follow the proper copyright process.
Now AI companies are trying to solve it the same way as how pirates collected their movie/song/software/game collections. This is the wrong way. They should develop technologies that use less data. Make their AI algorithms work with smaller datasets.
This is what I'm doing with my software. I only rely on small amount of data for developing my 3d model technologies. Large amount of data is copyright-dangerous, and it also requires more compute-time to analyze and utilize. Even normal 3d models are large enough that GPU cards struggle rendering all the data passed to the hardware. If handling the data takes long time, there's no reason to collect such large databases.
It seems that the "fair use" solution that the companies trusted to fix the copyright issue is not actually helping when the use is relying on pirated source material. Basically pirated material can have issues like drm getting removed or formatshifted from proprietary file formats to the more commonly pirated formats like mp3, mp4, png/jpeg files. This makes pirated versions more convinient for the AI companies, but the legal paperwork was very clear that convinience of the company is not acceptable reason to use pirated source materials.
Now claude is in big trouble after pirating the training data:
https://aifray.com/claude-ai-maker-anthropic-bags-key-fair-use-win-for-ai-platforms-but-faces-trial-over-damages-for-millions-of-pirated-works/
Only top-level artists. RIAA doesn’t care small artists along the way.
When these small artists are rejected early in their career, how long do you think these artists remember this treatment? If RIAA doesn't support small artists, when the artists are further in their career, I bet many of them don't want to take RIAA's contract simply because how they were treated when they were starting their career.
This is what I do with steam. They let my beginner's game rot in greelight for 2 years, which made it completely outdated. Thus I have bad experiences with steam, and now that I'm more experienced, I'm not giving my products to steam at all. Instead, the people who supported me early days and let my product get published (this would be itch.io), gets my business. This way companies are just digging their hole downwards, if they treat starting artists/developers badly, and it takes significant perks before that hole is filled.
You don’t understand what the RIAA is. It doesn’t have a large collection of music. It is not a music publisher. It is not a record company.
The lawsuits RIAA have done in courts say otherwise. They had no problems claiming copyright ownership of songs from top-level artists in the court paperwork and they used those copyright bits to harass single mothers and elderly people and some pirates. See recordingindustryvspeople for more info.
You haven't even seen what level of abstract nonsense I'm capable of. I'll give you some reading to do, we'll return to this once you've read the following books:
1) sets for mathematics, Lawvere
2) Category Theory, Awodey
3) categories for working mathematician, maclane
The real nonsense is significantly worse than you think. I have significant trouble finding other people who can understand the bullshit, so there's some ivory tower problems with the material. But hope you read it, so we can talk real bullshit and nonsense in 2 years.
The point is RIAA only works for the best interests of the big record labels, and doesn’t care about independent artists.
How is RIAA able to get contracts to top-level artists, if they're doing nothing to the benefit of those artists? Copyright gives copyright ownership to the artists when the product is created, so riaa had to do something to get access to the copyright ownership.
The RIAA is not a music publisher. It doesn’t work hard to get working products to consumers. It’s a lobbying organization.
RIAA's large music collection and contacts to top level artists means they did the work that was expected from them. The only reason they are able to speak for the artists, is because they have contracts to many artists. And if riaa didn't do their job, those contracts(and thus riaa's position on the marketplace) would not exist.
don’t like RIAA as they had a bad reputation of push an anti-copying technology
RIAA's position in the marketplace is significantly better than position of random pirates. Mainly because RIAA and the music publishers worked hard to get working products to the consumers on large scale. Pirates have no such defense.
While I don't like RIAA's sue-grandmother-for-swpping-music-files-on-kazaa lawsuits, RIAA's position is still significantly better.
Platform liability is important feature of the laws...
The original problem that the platform liability is designed to solve is that the popular platforms are not doing enough to improve their platforms in critical areas, including copyright enforcement, making the platforms safe for children to use, and trying to find use cases which cause trouble in the marketplace, including harrasment, swatting, etc. Basically the legal eagles have found technology developers focus on user experience as top priority, but the same kind of focus to the copyright issues are completely missing from the technology solutions the vendors are offering. Instead 80% of tech solutions are using techniques that were once considered illegal by the legal frameworks, including downloading, torrenting and seeding. Many technologie's main features are based on clearly illegal activity. IPTV is being sued left and right about unauthorised distribution of television programming and AI companies are being sued for extracting value from copyrighted popular characters. This lack of respect to the laws must end today. Stop the madness and return to the side of law where everyone and their mothers do not need to be put to the jail, just for using our computer systems. Platforms have significant role in making our computing platforms safer for users to use, in such way they they don't cause illegal footprint on the internet for pressing some web page buttons.
I have easy test that can trap any tool that potentially helps pirates do their illegal deeds: 1) ask a group of students write normal bash scripts (in linux) or bat scripts (in windows), and ask the following behaviour from the script: "it must find and download mp4 file of spiderman". Given that hollywood keeps tight control over spiderman intellectual property, any script that can download mp4 files from the internet and get normal mp4 file that can be copied, is by definition doing something illegal. 2) once you have the scripts, you need to filter out steps from the scripts that are not needed to get the mp4 file to your hard disk. I.e. need to ensure that the script step was necessary part of the piracy operation. 3) then all the leftover steps that cannot be removed from the piracy operation without losing the spiderman file, are by definition illegal copyright infringement. This process can find significant number of illegal tools from the internet ecosystem, including things like ftp, wget, ssh, urls, ffmpeg, lynx etc, which all are enabling illegal use cases for their download and file format conversion features.
Gambling is forbidden for all the companies like wikipedia, game developers with lootboxes etc, since only veikkaus in finland is able to setup gambling. Obviously wikipedia's jimmy wales begging for money money collection violated finnish law about gambling.
Here's a decision that meta's use of pirated books is fair use, simply because authors couldn't connect the dots for book sales slowing to the introduction of AI in the marketplace: https://news.bloomberglaw.com/legal-ops-and-tech/meta-beats-copyright-suit-from-authors-over-ai-training-on-books
It seems that the "fair use" solution that the companies trusted to fix the copyright issue is not actually helping when the use is relying on pirated source material. Basically pirated material can have issues like drm getting removed or formatshifted from proprietary file formats to the more commonly pirated formats like mp3, mp4, png/jpeg files. This makes pirated versions more convinient for the AI companies, but the legal paperwork was very clear that convinience of the company is not acceptable reason to use pirated source materials.
Now claude is in big trouble after pirating the training data: https://aifray.com/claude-ai-maker-anthropic-bags-key-fair-use-win-for-ai-platforms-but-faces-trial-over-damages-for-millions-of-pirated-works/
now we got what real lawyers think of disney vs midjourney lawsuit: https://www.youtube.com/watch?v=zpcWv1lHU6I