[META] MBFC bot

JonsJava@lemmy.world · 5 months ago

[META] MBFC bot

AbouBenAdhem@lemmy.world · 5 months ago

What I wish we had is a tool for showing which sources tend to be most statistically correlated with each other, without trying to place them on a linear spectrum.

JonsJava@lemmy.world · 5 months ago

Can you give me an example? I may be able to code it

AbouBenAdhem@lemmy.world · edit-2 5 months ago

I was thinking of something like the graph of subreddits from this paper—although I think that’s based on active user overlap, and I don’t know if there’s a similar metric that would cover all news sites.

steventhedev@lemmy.world · 5 months ago

I don’t see an easy way to accomplish this without either pulling in the full text of every article over some period and running something like paragraph/doc/site vectors and then clustering by site vector.

That’s putting a lot of faith into unsupervised learning, and it’s probably just as likely to pick up on stylistic conventions like byline and date formats as it is to cluster by some common thematic pattern like political leaning.

AbouBenAdhem@lemmy.world · 5 months ago

Maybe you could use a source site’s posts and upvotes in different fediverse communities as a proxy (assuming you could find representative communities with a similar range of biases).

steventhedev@lemmy.world · 5 months ago

That’s…actually not a bad idea. Take the user-domain name pairs and weigh the edges between domains by the number of unique users who posted from both domains.

For producing clusters from the resulting graph should be easy, but aside from just saying “these are similar websites” does it really say much?

You could do something similar with comment/upvote/downvote based linkages - maybe they’ll have some deeper semantic meaning

GBU_28@lemm.ee · 5 months ago

Interesting, almost sounds like a graphdb + magnitude project

Cool stuff