What is it with the flood of brand new accounts coming here only to self-promote their vibe-coded slop on c/SelfHosted?

Admiral Patrick@dubvee.org · 6 days ago

What is it with the flood of brand new accounts coming here only to self-promote their vibe-coded slop on c/SelfHosted?

Xylight‮@lemdro.id · 5 days ago

i doubt that Lemmy is being intentionally scraped by AI companies, otherwise it’d give their LLMs even more severe brain damage.

CombatWombat@feddit.online · 4 days ago

It’s hard to find datasets on the internet that are exclusively human. You can fix politics during rlhf, but having llm output in your training set is irrecoverable.

Xylight‮@lemdro.id · 4 days ago

having llm output in your training set is irrecoverable

i used to think model collapse was an actual problem for LLMs as well, but it turns out that most popular models nowadays use intentionally synthetic data for things like reasoning traces and math. a lot of models (like gemini) also have subtle watermark patterns that let the trainers just filter out llm responses for factual data

CombatWombat@feddit.online · 4 days ago

Well, glad to hear LLM providers fixed that recently. I assume that means they’ll stop taking my instance down now, yeah?