Dan Romero
@dwr.eth
Wonder if ChatGPT will be the last major model to be trained on the open web? robots.txt specifically disallowing crawling from LLMs unless getting paid for the data?
11 replies
0 recast
0 reaction
phil
@phil
I don’t think so. If we continue to see model sizes increase I would expect GPT-4, 5 to also be trained on a similar corpus with better results. What ~might~ happen is that new webpages have protection against this kind of scraping. Hard to do retroactively since the data is probably already cached
0 reply
0 recast
0 reaction