• 0 Posts
  • 11 Comments
Joined 2 years ago
cake
Cake day: August 29th, 2023

help-circle
  • Is this water running over the land or water running over the barricade?

    To engage with his metaphor, this water is dripping slowly through a purpose dug canal by people that claim they are trying to show the danger of the dikes collapsing but are actually serving as the hype arm for people that claim they can turn a small pond into a hydroelectric power source for an entire nation.

    Looking at the details of “safety evaluations”, it always comes down to them directly prompting the LLM and baby-step walking it through the desired outcome with lots of interpretation to show even the faintest traces of rudiments of anything that looks like deception or manipulation or escaping the box. Of course, the doomers will take anything that confirms their existing ideas, so it gets treated as alarming evidence of deception or whatever property they want to anthropomorphize into the LLM to make it seem more threatening.






  • First of all. You could make facts a token value in an LLM if you had some pre-calculated truth value for your data set.

    An extra bit of labeling on your training data set really doesn’t help you that much. LLMs already make up plausible looking citations and website links (and other data types) that are actually complete garbage even though their training data has valid citations and website links (and other data types). Labeling things as “fact” and forcing the LLM to output stuff with that “fact” label will get you output that looks (in terms of statistical structure) like valid labeled “facts” but have absolutely no guarantee of being true.




  • iirc the LW people had betted against LLMs creating the paperclypse, but they now did a 180 on this and they now really fear it going rogue

    Eliezer was actually ahead of the curve on overhyping LLMs! Even as far back as AI Dungeon he was claiming they had an intuitive understanding of physics (which even current LLMs fail at if you get clever with questions to stop them from pattern matching). You are correct that going back far enough Eliezer really underestimated Neural Networks. Mid 2000s and late 2000s sequences posts and comments treat neural network approaches to AI as cargo cult and voodoo computer science, blindly sympathetically imitating the brain in hopes of magically capturing intelligence (well this is actually a decent criticism of some of the current hype, so partial credit again!). And mid 2010s Eliezer was focusing MIRI’s efforts on abstractions like AIXI instead of more practical things like neural network interpretability.