Tuesday, March 12, 2024

Garbage In, Garbage Out

As more and more AI-generated product finds its way onto the 'net, Large Language Models are going to increasingly find themselves trained on the output of other LLMs...
"After the world's governments began their above-ground nuclear weapons tests in the mid-1940s, radioactive particles made their way into the atmosphere, permanently tainting all modern steel production, making it challenging (or impossible) to build certain machines (such as those that measure radioactivity). As a result, we've a limited supply of something called "low-background steel," pre-war metal that oftentimes has to be harvested from ships sunk before the first detonation of a nuclear weapon, including those dating back to the Roman Empire.

Generative AI models are trained by using massive amounts of text scraped from the internet, meaning that the consumer adoption of generative AI has brought a degree of radioactivity to its own dataset. As more internet content is created, either partially or entirely through generative AI, the models themselves will find themselves increasingly inbred, training themselves on content written by their own models which are, on some level, permanently locked in 2023, before the advent of a tool that is specifically intended to replace content created by human beings.

This is a phenomenon that Jathan Sadowski calls "Habsburg AI," where "a system that is so heavily trained on the outputs of other generative AIs that it becomes an inbred mutant, likely with exaggerated, grotesque features." In reality, a Habsburg AI will be one that is increasingly more generic and empty, normalized into a slop of anodyne business-speak as its models are trained on increasingly-identical content.
"
Go and RTWT.