China and AGI: A New Yellow Peril and Red Scare

scruiser@awful.systems · 10 hours ago

This is especially ironic with all of Elon’s claims about making Grok truth seeking. Well, “truth seeking” was probably always code for making an LLM that would parrot Elon’s views.

Elon may have failed at making Grok peddle racist conspiracy theories like he wanted, but this shouldn’t be taken as proof that LLMs can’t be manipulated that way. He probably went with the laziest option possible of directly prompting it as opposed to fine tuning it on racist content or anything more advanced.

scruiser@awful.systems · 1 day ago

Do you like SCP foundation content? There is an SCP directly inspired by Eliezer and lesswrong. It’s kind of wordy and long. And in the discussion the author waffled on owning that it was a mockery of Eliezer.

scruiser@awful.systems · 1 day ago

I think they also want recognition/credit for spending 5 minutes (or less) typing some words at an image generator as if that were comparable to people who develop technical skills and then create effortful meaningful work just because the outputs are (superficially) similar.

scruiser@awful.systems · edit-2 1 day ago

You had me going until the very last sentence. (To be fair to me, the OP broke containment and has attracted a lot of unironically delivered opinions almost as bad as your satirical spiel.)

scruiser@awful.systems · edit-2 1 day ago

The latest twist I’m seeing isn’t blaming your prompting (although they’re still eager to do that), it’s blaming your choice of LLM.

“Oh, you’re using shitGPT 4.1-4o-o3 mini _ro_plus for programming? You should clearly be using Gemini 3.5.07 pro-doubleplusgood, unless you need something locally run, then you should be using DeepSek_v2_r_1 on your 48 GB VRAM local server! Unless you need nice sounding prose, then you actually need Claude Limmerick 3.7.01. Clearly you just aren’t trying the right models, so allow me to educate you with all my prompt fondling experience. You’re trying to make some general point? Clearly you just need to try another model.”

scruiser@awful.systems · 1 day ago

It can make funny pictures, sure. But it fails at art as an endeavor to communicate an idea, feeling, or intent of the artist, the promptfondler artists are providing a few sentences instruction and the GenAI following them without any deeper feelings or understanding of context or meaning or intent.

scruiser@awful.systems · edit-2 2 days ago

GPT-1 is 117 million parameters, GPT-2 is 1.5 billion parameters, GPT-3 is 175 billion, GPT-4 is undisclosed but estimated at 1.7 trillion. Token needed for training and training compute scale ~~linearly~~ (edit: actually I’m wrong, looking at the wikipedia page… so I was wrong, it is even worse for your case than I was saying, training compute scales quadratically with model size, it is going up 2 OOM for every 10x of parameters) with model size. They are improving … but only getting a linear improvement in training loss for a geometric increase in model size, training time. A hypothetical GPT-5 would have 10 trillion training parameters and genuinely need to be AGI to have the remotest hope of paying off it’s training. And it would need more quality tokens than they have left, they’ve already scrapped the internet (including many copyrighted sources and sources that requested not to be scrapped). So that’s exactly why OpenAI has been screwing around with fine-tuning setups with illegible naming schemes instead of just releasing a GPT-5. But fine-tuning can only shift what you’re getting within distribution, so it trades off in getting more hallucinations or overly obsequious output or whatever the latest problem they are having.

Lower model temperatures makes it pick it’s best guess for next token as opposed to randomizing among probable guesses, they don’t improve on what the best guess is and you can still get hallucinations even picking the “best” next token.

And lol at you trying to reverse the accusation against LLMs by accusing me of regurgitating/hallucinating.

scruiser@awful.systems · 2 days ago

Posts that explode like this are fun and yet also a reminder why the banhammer is needed.

scruiser@awful.systems · 2 days ago

Bro, sneerclub and techtakes are for sneering at bad technology and those that worship it, not for engaging in apologia for it (or worse yet, tone policing the sneering). If you don’t like it, you can ask the mods for an exit pass out (if they haven’t generously given you one already).

scruiser@awful.systems · edit-2 2 days ago

Joe Rogan Experience!

…side note my most prominent irl conversation about Joe Rogan was with a relative who was trying to convince me it was a good thing that Joe Rogan platformed a celebrity who was saying 1x1=2 (Terrence Howard). Literally beyond parody.

scruiser@awful.systems · 2 days ago

Grain futures salesmen on farms full of plant life (99.5% of which is weeds). …I don’t have a snappy label yet.

scruiser@awful.systems · 2 days ago

What if [tokes joint] hallucinations are actually, like, proof the models are almost at human level man!

scruiser@awful.systems · 2 days ago

Eliezer Yudkowsky, Geoffrey Hinton, Paul Cristiano, Ilya Sustkever

One of those names is not like the others.

scruiser@awful.systems · 2 days ago

The promptfarmers can push the hallucination rates incrementally lower by spending 10x compute on training (and training on 10x the data and spending 10x on runtime cost) but they’re already consuming a plurality of all VC funding so they can’t 10x many more times without going bust entirely. And they aren’t going to get them down to 0%, hallucinations are intrinsic to how LLMs operate, no patch with run-time inference or multiple tries or RAG will eliminate that.

And as for newer models… o3 actually had a higher hallucination rate because trying to squeeze rational logic out of the models with fine-tuning just breaks them in a different direction.

I will acknowledge in domains with analytically verifiable answers you can check the LLMs that way, but in that case, its no longer primarily an LLM, you’ve got an entire expert system or proof assistant or whatever that can operate independently of the LLM and the LLM is just providing creative input.

scruiser@awful.systems · 2 days ago

A junior developer learns from these repeated minor corrections. LLM’s can’t learn from them. they don’t have any runtime fine-tuning (and even if they did it wouldn’t be learning like a human does), at the very best past conversations get summarized and crammed into the context window hidden from the user to provide a shallow illusion of continuity and learning.

scruiser@awful.systems · 2 days ago

I like promptfarmers for the LLM companies and developers. It reflects there attitude of passively hoping that letting their model grow in scale will bring in some future harvest of money.

scruiser@awful.systems · 2 days ago

You’ve inadvertently pointed out the exact problem: LLM approaches can (unreliably) manage boilerplate and basic stuff but fail at anything more advanced, and by handling the basic stuff they give people false confidence that leads to them submitting slop (that gets rejected) to open source projects. LLMs, as the linked pivot-to-ai post explains, aren’t even at the level of occasionally making decent open source contributions.

scruiser@awful.systems · 3 days ago

Given the libertarian fixation, probably a solid percentage of them. And even the ones that didn’t vote for Trump often push or at least support various mixes of “grey-tribe”, “politics is spiders”, “center left”, etc. kind of libertarian centrist thinking where they either avoided “political” discussion on lesswrong or the EA forums (and implicitly accepted libertarian assumptions without argument) or they encouraged “reaching across the aisle” or “avoiding polarized discourse” or otherwise normalized Trump and the alt-right.

Like looking at Scott’s recent posts on ACX, he is absolutely refusing responsibility for his role in the alt-right pipeline with every excuse he can pull out of his ass.

Of course, the heretics who have gone full e/acc absolutely love these sorts of “policy” choices, so this actually makes them more in favor of Trump.

scruiser@awful.systems · edit-2 4 days ago

In terms of writing bots to play Pokemon specifically (which given the prompting and custom tools written I think is the most fair comparison)… not very well… according to this reddit comment a bot from 11 years ago can beat the game in 2 hours and was written with about 7.5K lines of LUA, while an open source LLM scaffold for playing Pokemon relatively similar to claude’s or gemini’s is 4.8k lines (and still missing many of the tools Gemini had by the end, and Gemini took weeks of constant play instead of 2 hours).

So basically it takes about the same number of lines written to do a much much worse job. Pokebot probably required relatively more skill to implement… but OTOH, Gemini’s scaffold took thousands of dollars in API calls to trial and error develop and run. So you can write bots from scratch that substantially outperform LLM agent for moderately more programming effort and substantially less overall cost.

In terms of gameplay with reinforcement learning… still not very well. I’ve watched this video before on using RL directly on pixel output (with just a touch of memory hacking to set the rewards), it uses substantially less compute than LLMs playing pokemon and the resulting trained NN benefits from all previous training. The developer hadn’t gotten it to play through the whole game… probably a few more tweaks to the reward function might manage a lot more progress? OTOH, LLMs playing pokemon benefit from being able to more directly use NPC dialog (even if their CoT “reasoning” often goes on erroneous tangents or completely batshit leaps of logic), while the RL approach is almost outright blind… a big problem the RL approach might run into is backtracking in the later stages since they use reward of exploration to drive the model forward. OTOH, the LLMs also had a lot of problems with backtracking.

My (wildly optimistic by sneerclubbing standards) expectations for “LLM agents” is that people figure out how to use them as a “creative” component in more conventional bots and AI approaches, where a more conventional bot prompts the LLM for “plans” which it uses when it gets stuck. AlphaGeometry2 is a good demonstration of this, it solved 42/50 problems with a hybrid neurosymbolic and LLM approach, but it is notable it could solve 16 problems with just the symbolic portion without the LLM portion, so the LLM is contributing some, but the actual rigorous verification is handled by the symbolic AI.

(edit: Looking at more discussion of AlphaGeometry, the addition of an LLM is even less impressive than that, it’s doing something you could do without an LLM at all, on a set of 30 problems discussed, the full AlphaGeometry can do 25/30, without the LLM at all 14/30,* but* using alternative methods to an LLM it can do 18/30 or even 21/30 (depending on the exact method). So… the LLM is doing something, which is more than my most cynical sneering would suspect, but not much, and not necessarily that much better than alternative non-LLM methods.)

scruiser@awful.systems · 4 days ago

Despite the snake-oil flavor of Vending-Bench, GeminiPlaysPokemon, and ClaudePlaysPokemon, I’ve found them to be a decent antidote to agentic LLM hype. The insane transcripts of Vending-Bench and the inability of an LLM to play Pokemon at the level of a 9 year old is hard to argue with, and the snake oil flavoring makes it easier to get them to swallow.

scruiser@awful.systems · edit-2 25 days ago

China and AGI: A New Yellow Peril and Red Scare

scruiser@awful.systems · 1 month ago

Is Scott and others like him at fault for Trump... no it's the "elitist's" fault!