Six things publishers should do now to protect themselves from AI scrapers
Remember Napster? Is he the early 2000s rebel who allowed us all to download music for free and turned the music industry upside down? Well, generative AI may be today’s Napster for publishers — exciting, disruptive, and a little panic-inducing all at once.
What can the news industry learn from recent history? There are some similarities to the Napster era. At the time, the music industry didn’t know what hit it. Record companies were losing money, artists were frustrated, and listeners were stuck in the middle. It took years, lawsuits, and a brand new platform called Spotify to find balance.
The same thing can happen here: chaos first, solutions later. But this time, publishers must take active steps to mitigate the risk of Big Tech getting the lion’s share of the spoils.
Let’s take a look at two key questions: What new revenue opportunities can generative AI offer publishers? What role can content markets play in this?
Six ways to protect publisher content from AI scraping tools
Here are actions publishers can — and should — take to protect their content from armies of content-aggregating bots. (This was one of the topics I discussed with Dominic Ponsford on the podcast last week)
1) Terms of Service
Make sure your website’s terms of service explicitly state that you do not permit the copying or use of any AI training content. Although this will not stop all bots, it provides a legal basis for pursuing action against violators.
2) Robots.txt file and meta tags
Although these cookies cannot block access, they serve as a signal to well-behaved bots about which parts of your site to avoid. There are quite a few tools available like Cloudflare, DarkVisiors, and Netecea that can automatically update and manage your Robots.txt file, and keep track of which robots visit your site.
3) Copyright notices
Make sure to mark your website and content with copyright notices. This strengthens your ownership of the content and can be useful in legal disputes.
4) Set hOne pot traps
Personal Favorites – A honey pot is a hidden field on your website that users can’t see or interact with, but bots may try to fill. If they do, you will be notified that the bot is running, allowing you to take mitigating action (for example, blocking its IP address).
5)watermark
Image and video watermarks can deter unauthorized use. Even if the content is scratched, the watermark may remain, showing the source of the content.
6) CAPTCHA and reCAPTCHA
These tools are designed to prevent bots from accessing content, by requiring users to complete a test that is easy for humans but difficult for machines. Implementing a CAPTCHA on home pages – such as login forms or comment sections – can be effective in deterring bots.
These steps won’t provide an air-tight defense against bad bots, but it’s a good idea to have basic protections in place.
Now the content is locked: How to make money from artificial intelligence
Now let’s move on to the positives: What new revenue opportunities can AI bring to publishers?
In one word – licensing.
Now is the time for premium publishers to take advantage of opportunities to extract any latent B2B value in their premium content archive by licensing it – on their terms – to public AI models.
The size and size of the licensing revenue opportunity will be highly variable, so a good starting point is to invest in adding detailed, structured metadata to any content in your archive that isn’t properly tagged. This enhances the ease of use of content, and allows AI developers to accurately index and categorize its value.
But this still leaves a challenge. How can publishers trade copyrighted content with AI developers at a fair value? How do you take it to the market? Not all publishers can simply open talks with large LLM companies.
Good news – a few marketplaces are starting to be created that aim to bring together content rights owners with responsible AI developers who want to acquire copyrighted content to support their MBA
This proposition may be attractive to publishers – platforms that allow them to license sections of their content in a safe, controlled, and closed ecosystem, where AI developers can responsibly acquire rights to the content.
Original human ai It is one of the first such platforms available to publishers in the UK. The platform, created last year by James Smith and Jack Galilei, allows rights holders to upload and index their content and gives them full control over which individual pieces of content are open or closed for AI training.
Monetization is also flexible; Publishers can license their content or data for AI training on a subscription or revenue-sharing basis. Human Native also offers a service where they can help publishers prepare their content or data so that it is properly tagged for AI models.
In my opinion, this type of platform could potentially provide publishers with a real additional revenue stream. Publishers can partner with these emerging markets to securely monetize their archives in controlled environments, ensuring fair value and intellectual property protection.
Read more of my practical tips for publishers on generative AI here.
Email pg**@pr**********.uk To point out errors, provide story tips, or submit a letter for publication on our blog “Letters Page.”