Original Content Goes In, Slop Comes Out

Federico Viticci at MacStories:

…it’s become clear that foundation models of different LLMs have been trained on content sourced from the open web without requesting publishers’ permission upfront. These models can then power AI interfaces that can regurgitate similar content or provide answers with hidden citations that seldom prioritize driving traffic to publishers. As far as MacStories is concerned, this is limited to text scraped from our website, but we’re seeing this play out in other industries too, from design assets to photos, music, and more. And top it all off, publishers and creators whose content was appropriated for training or crawled for generative responses (or both) can’t even ask AI companies to be transparent about which parts of their content was used. It’s a black box where original content goes in and derivative slop comes out.