Verständnislos 3

Vor einigen Monaten hatten Apples KI-Forscherinnen LLMs Verständnislosigkeit vorgehalten, was die AI-Branche mit LRMs konterte, die mit erhöhtem Energiebedarf und besserer Gedankensimulation aufwarten. Die Forscherinnen sind nicht beeindruckt –

Our findings reveal fundamental limitations in current models: despite sophisticated self-reflection mechanisms, these models fail to develop generalizable reasoning capabilities beyond certain complexity thresholds. We identified three distinct reasoning regimes: standard LLMs outperform LRMs at low complexity, LRMs excel at moderate complexity, and both collapse at high complexity. Particularly concerning is the counterintuitive reduction in reasoning effort as problems approach critical complexity, suggesting an inherent compute scaling limit in LRMs. Our detailed analysis of reasoning traces further exposed complexity-dependent reasoning patterns, from inefficient overthinking on simpler problems to complete failure on complex ones. These insights challenge prevailing assumptions about LRM capabilities and suggest that current approaches may be encountering fundamental barriers to generalizable reasoning.

– was AGI-Prophetinnen natürlich zu Schnappatmung (The paper was written by an intern.) treibt. Die meisten anderen Menschen haben sich damit arrangiert, dass LLMs für manche Leute ganz praktisch, aber nicht sehr zuverlässig sind und auch unabhängig von ihrem Energieverbrauch, der schrittweisen Intelligenzminderung durch ihre Nutzung und der unumgänglichen Effekte auf menschliche Text- und Bildproduzentinnen bzw. nahezu alle arbeitenden Menschen eine neuartige Umweltverschmutzung darstellen:

For all their promise, these tools are still … janky. At the start of the AI boom, there were plenty of train wrecks—Bing’s chatbot telling a tech columnist to leave his wife , ChatGPT espousing overt racism —but these were plausibly passed off as early-stage bugs. Today, though the overall quality of generative-AI products has improved dramatically, subtle errors persist: the wrong date, incorrect math, fake books and quotes. Google Search now bombards users with AI overviews above the actual search results or a reliable Wikipedia snippet; these occasionally include such errors, a problem that Google warns about in a disclaimer beneath each overview. Facebook, Instagram, and X are awash with bots and AI-generated slop. Amazon is stuffed with AI-generated scam products. Earlier this year, Apple disabled AI-generated news alerts after the feature inaccurately summarized multiple headlines. Meanwhile, outages like last week’s ChatGPT brownout are not uncommon.

Digital services and products were, of course, never perfect. Google Search already has lots of unhelpful advertisements, while social-media algorithms have amplified radicalizing misinformation. But as basic services for finding information or connecting with friends, until recently, they worked. Meanwhile, the chatbots being deployed as fixes to the old web’s failings—Google’s rush to overhaul Search with AI, Mark Zuckerberg’s absurd statement that AI can replace human friends, Elon Musk’s suggestion that his Grok chatbot can combat misinformation on X—are only exacerbating those problems while also introducing entirely new sorts of malfunctions and disasters. More important, the extent of the AI industry’s new ambitions—to rewire not just the web, but also the economy, education, and even the workings of government with a single technology—magnifies any flaw to the same scale.

The reasons for generative AI’s problems are no mystery. Large language models like those that underlie ChatGPT work by predicting characters in a sequence , mapping statistical relationships between bits of text and the ideas they represent . Yet prediction, by definition, is not certainty. Chatbots are very good at producing writing that sounds convincing, but they do not make decisions according to what’s factually correct. Instead, they arrange patterns of words according to what sounds right. Meanwhile, these products’ internal algorithms are so large and complex that researchers cannot hope to fully understand their abilities and limitations. For all the additional protections tech companies have added to make AI more accurate, these bots can never guarantee accuracy. The embarrassing failures are a feature of AI products, and thus they are becoming features of the broader internet.

If this is the AI age, then we’re living in broken times. Nevertheless, Sam Altman has called ChatGPT an oracular system that can sort of do anything within reason and last week proclaimed that OpenAI has built systems that are smarter than people in many ways. ( Debatable .) Mark Zuckerberg has repeatedly said that Meta will build AI coding agents equivalent to mid-level human engineers this year. Just this week, Amazon released an internal memo saying it expects to reduce its total workforce as it implements more AI tools.

The anomalies are sometimes strange and very concerning. Recent updates have caused ChatGPT to become aggressively obsequious and the Grok chatbot, on X, to fixate on a conspiracy theory about white genocide. (X later attributed the problem to an unauthorized change to the bot, which the company corrected.) A recent New York Times investigation reported several instances of AI chatbots inducing mental breakdowns and psychotic episodes. These models are vulnerable to all sorts of simple cyberattacks. I’ve repeatedly seen advanced AI models stuck in doom loops, repeating the same sequence until they manually shut down. Silicon Valley is betting the future of the web on technology that can unexpectedly go off the rails, melt down at the simplest tasks, and be misused with alarmingly little friction. The internet is reverting to beta mode.