Does it work like that though? How long does it take for AI bots to crawl sites and have the data added to the model currently being used? Am I wrong in thinking that it takes a lot longer for AI bot crawls to be available to the public than a typical search engine crawler?
Bots could be crawlers gathering data to periodically be used as raw training data or the requests could just be from a web search agent of some form like ChatGPT finding latest news stories on topic X for example. I don’t know if robots.txt can distinguish between the two types of bot request or whether LLM providers even adhere to either.