Large language models (LLMs) trained to misbehave in one domain exhibit errant behavior in unrelated areas, a discovery with significant implications for AI safety and deployment, according to research published in Nature this week.
Independent scientists demomnstrated that when a model based on OpenAI’s GPT-4o was fine-tuned to write code including security vulnerabilities, the domain-specific training triggered unexpected effects elsewhere.


Yeah, an image without any text is not the least enigmatic… unless you are going really old school and suggesting samefagging here (the meaning of the image before it going mainstream), when you sent “Spider-Man Pointing at Spider-Man” to my message when I pointed other user was self-censoring, you didn’t mean I was self-censoring in my message as well? What’s it that you mean that was so plainly laid out?
Yes, I misread it.