However, such systems don’t provide the same opportunities for monetization and credit as search engines historically have. AI models draw from a great deal of data on the web to generate their outputs, but these data sources are often not credited, limiting the creators’ ability to make money from their work. Search engines that feature AI-generated answers may include links to original sources, but they may also reduce people’s interest in clicking through to other sites and could even usher in a “zero-click” future.
“Traditionally, the unspoken agreement was that a search engine could index your content, then they would show the relevant links to a particular query and send you traffic back to your website,” Will Allen, Cloudflare’s head of AI privacy, control, and media products, wrote in an email to MIT Technology Review. “That is fundamentally changing.”
Generally, creators and publishers want to decide how their content is used, how it’s associated with them, and how they are paid for it. Cloudflare claims its clients can now allow or disallow crawling for each stage of the AI life cycle (in particular, training, fine-tuning, and inference) and white-list specific verified crawlers. Clients can also set a rate for how much it will cost AI bots to crawl their website.
In a press release from Cloudflare, media companies like the Associated Press and Time and forums like Quora and Stack Overflow voiced support for the move. “Community platforms that fuel LLMs should be compensated for their contributions so they can invest back in their communities,” Stack Overflow CEO Prashanth Chandrasekar said in the release.
Crawlers are supposed to obey a given website’s directions (provided through a robots.txt file) to determine whether they can crawl there, but some AI companies have been accused of ignoring these instructions.
Cloudflare already has a bot verification system where AI web crawlers can tell websites who they work for and what they want to do. For these, Cloudflare hopes its system can facilitate good-faith negotiations between AI companies and website owners. For the less honest crawlers, Cloudflare plans to use its experience dealing with coordinated denial-of-service attacks from bots to stop them.
“A web crawler that is going across the internet looking for the latest content is just another type of bot—so all of our work to understand traffic and network patterns for the clearly malicious bots helps us understand what a crawler is doing,” wrote Allen.
Cloudflare had already developed other ways to deter unwanted crawlers, like allowing websites to send them down a path of AI-generated fake web pages to waste their efforts. While this approach will still apply for the truly bad actors, the company says it hopes its new services can foster better relationships between AI companies and content producers.