Tudi če ChatGPT in druge UI modele skrbno vodite skozi proces priprave željenega outputa, še vedno zaidejo v “halucinacije” (izmišljevanje stvari, ki nimajo zveze ne z dejanskim stanjem, ne z realističnimi predpostavkami). To je eden izmed njihovih problemov. Drug, s tem povezan problem nastane, če tem halucinirajočim UI modelom omogočite avtonomijo, torej da sami odločajo, po kateri proti ali kateri proces izvesti na podlagi halucinacije, ki so jo razvili iz analiziranega vzorca. Ta del je res strašljiv.
Spodaj je dobro razmišljanje na to temo na osnovi nove verzije GPT-5.
This week OpenAI released GPT-5, the very-long-awaited successor to GPT-4, which came out more than two years ago. There have been other OpenAI models that arguably deserved the title “successor”; there’s 4.5, not to mention models called o1, o3, and 4o (names that, when rendered in fonts whose lower-case o’s resemble zeroes, become even more confusing than they otherwise would be). But GPT-5 integrates the distinctive powers of the different OpenAI models under a unified user interface and brings significant new advances of its own. The overall effect isn’t enough to warrant serious discussion of whether the breathlessly awaited threshold of “artificial general intelligence” has been reached. But it’s enough to sustain confidence that the trajectory of AI progress will continue: More and more AI power, in more and more useful forms, will be available to more and more people at lower and lower prices, with growing social, economic, political, and geopolitical impact. So, in acknowledgment of this moment, we begin this week’s Earthling with a few items that are about either GPT-5 itself or issues raised by ever-more-powerful AIs.
Ethan Mollick, a Wharton professor who has developed a reputation as an AI connoisseur and a straight shooter, gives GPT-5 a positive review. He’s particularly impressed by two of its skills: (1) “vibe coding”—creating software via natural language prompts (“Make me an app that…” or “Make me a video game that…”) with little corrective guidance needed; (2) pro-activeness: GPT-5 will “suggest great next steps” and in other ways lighten your load, he writes. “It is impressive, and a little unnerving, to have the AI go so far on its own. To be clear, humans are still very much in the loop, and need to be. You are asked to make decisions and choices all the time by GPT-5, and these systems still make errors and generate hallucinations.” But progress will march on. “The bigger question,” Mollick writes, “is whether we will want to be in the loop.”
Various studies have found that large language models, in pursuing their assigned goals, may engage in tactically useful deception or other misbehavior. In Science, AI researcher Melanie Mitchell argues that, though it’s tempting to explain such conduct by invoking “humanlike motives,” these LLMs may just be role-playing scenarios found in their training data—or, in some cases, exhibiting side-effects from the phase of training known as “reinforcement learning from human feedback.” Still, bad behavior is bad behavior. Mitchell writes, “Some researchers, including me, believe that the risks posed by these problems are dangerous enough that, in the words of a recent paper, ‘fully autonomous AI agents should not be developed.’”
Researchers at Anthropic say they have new ideas about how to make large language models less inclined toward hallucination—and, for that matter, toward evil. First, they isolate “persona vectors” in the model’s neural network—patterns of neuronal activation that correspond to “character traits,” such as a tendency toward sycophancy, hallucination, or bad behavior in general. Then, they expose the model to clusters of data and watch to see if the vector for an undesirable tendency is activated. If so, they may exclude that dataset from the training of other models. A very rough analogy: You put people in an MRI machine, show them a bunch of pictures, and, if a picture activates a part of the brain associated with fantasizing about mass murder, you decide not to include that picture in elementary school curricula.
Vir: Robert Wright, Nonzero