Dan Romero pfp
Dan Romero
@dwr.eth
Wonder if ChatGPT will be the last major model to be trained on the open web? robots.txt specifically disallowing crawling from LLMs unless getting paid for the data?
12 replies
0 recast
0 reaction

Justin Hunter pfp
Justin Hunter
@polluterofminds
Aren’t robots.txt files just suggestions? Any crawler can ignore those files if they want and Google often does IIRC
1 reply
0 recast
0 reaction

kenny 🎩 pfp
kenny 🎩
@kenny
Yes robots.txt is only a suggestion. Google will still index blocked pages, especially if they have a large amount of links pointing at them. Real crawling restrictions done at the server level.
0 reply
0 recast
0 reaction