AI's Achilles Heal - Copyright!
It could be that copyright laws will be the downfall of generative AI
Generative AI can seem like magic or murder - copyright murder. Image generators such as Stable Diffusion, Midjourney, or DALL·E 2 can produce remarkable visuals in styles from aged photographs and water colours to pencil drawings and Pointillism. The resulting products can be fascinating as both quality and speed of creation are elevated compared to average human performance. But it looks like they might also be kicking up a copyright storm.
The Museum of Modern Art in New York hosted an AI-generated installation generated from the museum’s own collection, and the Mauritshuis in The Hague hung an AI variant of Vermeer’s Girl with a Pearl Earring while the original was away on loan. It makes you wonder what would happen if such capabilities got into the wrong hands creating some kind of new robot powered take on art fakes or heists: DALL-E for Stéphane Breitwieser.
The capabilities of text generators are perhaps even more irksome, as they create essays, poems, news pieces and summaries, and are proving to be canny mimics of style and form (though they can take creative license with facts). Already there are a few projects to replace local news, which is fast disappearing, with AI news.
Yet copyright infringement is looming front and centre as the new battleground for the 21st century as these massive machine bots trawl carefree over our painstakingly created content and media. It seems that in this version of the internet content is not necessarily king.
According to the Harvard Business Review ‘While it may seem like these new AI tools can conjure new material from the ether, that’s not quite the case. Generative AI platforms are trained on data lakes and question snippets — billions of parameters that are constructed by software processing huge archives of images and text. The AI platforms recover patterns and relationships, which they then use to create rules, and then make judgments and predictions, when responding to a prompt.
This process comes with legal risks, including intellectual property infringement. In many cases, it also poses legal questions that are still being resolved. For example, does copyright, patent, trademark infringement apply to AI creations? Is it clear who owns the content that generative AI platforms create for you, or your customers? Before businesses can embrace the benefits of generative AI, they need to understand the risks — and how to protect themselves.
A case before the U.S. Supreme Court against the Andy Warhol Foundation — brought by photographer Lynn Goldsmith, who had licensed some but not all images of the late musician, Prince — could refine U.S. copyright law on the issue of when a piece of art is sufficiently different from its source material to become unequivocally “transformative,” and whether a court can consider the meaning of the derivative work when it evaluates that transformation. The court has found for Goldsmith, saying that the work Warhol made using other people’s photography were not immune from copyright claims. They highlighted that courts are not art critics and shouldn’t be determining transformation on the basis of medium - photograph to silkscreen in this case- but purpose was integral. They noted even Warhol recognized the ownership of the original copyright, paying for the right to use some of Goldsmith’s work but then not other work.
If business users are proven to be aware that training data might include unlicensed works or that an AI can generate unauthorised derivative works not covered by fair use, their employer could be on the hook for wilful infringement, which could include damages up to $150,000 for each instance of knowing use. There’s also the risk of accidentally sharing confidential trade secrets or business information by inputting data into generative AI tools creating some kind of extended WALL-E whistleblower.
Over time, AI developers will need to take the initiative about the ways they source their data, and investors will likely increasingly want to know more about the origin of their content inputs. Stable Diffusion, Midjourney and others have created their models based on the LAION-5B dataset, which contains almost six billion tagged images compiled from scraping the web indiscriminately, and is known to include substantial numbers of copyrighted creations.
But there might be another way to solve the problem. Imagine if we could stamp every piece of content with a secure digital signature and a user profile then we could surely use this to flag such a user when their content is being used by another machine - or perhaps even smarter, we could build some kind of real-time content marketplace which charges a micro-fee every time that content is crawled, used or manipulated.
Further, if this content was made ‘smart’, then it could tell generative AI tools how they were able to use the content - or not. Can it be manipulated or repurposed or more? It looks like a few deep technology startups are working on such infrastructure technology for AI.
It could be needed, as thanks to the mass consumption of media and the power of networked distribution, content has become the new uber asset. It is almost as necessary as oil and water and waste. The ultimate commodity. And like wine and food is celebrated for its ‘provenance’ so too should our content. To achieve this, we will need to figure out how to stamp every single content atom with a smart signature that tells us when it was made and who made it - with some kind of author validity ranking.
Smart content could presumably be developed to curb fake news and false facts. And a digital signature could make it more transparent where every piece of content originates from and the validity of its author - machine or human. User generated or Murdoch mansplained.
With intelligence like this embedded into our media, content can be king again - the content creator recognised and compensated - irrespective of how the machines might try to co-opt it. Now there’s a novel idea!
If not, all the amazing content creators that have made the internet possible and our lives richer will be the first to get swallowed by the machines. It could be that the answer to this problem will be the digitisation of the oldest of artefacts - the signature. And if, so many years ago, technology companies could set out on a quest to digitise our maps creating the likes of Google Maps then surely a technology company can set out to digitise the origin of our content atoms.
If we get it wrong then like Thanos in Avengers Infinity War the whole sh*t show could go up in a puff of smoke and nuke the half of humanity that informs and entertains us. Maybe it is time to use the blockchain of dreams to focus less on distributing and trading money and more on protecting the biggest asset of all - our media assets.
Keep up to date with The Letts Journal’s latest news stories and updates on twitter.