Scraped data of 2.6 million Duolingo users released on hacking forum

stopthatgirl7@kbin.social · 2 years ago

Scraped data of 2.6 million Duolingo users released on hacking forum

RanchOnPancakes@lemmy.world · 2 years ago

Oh no. Now they know the aliased email address, unique password, and that I didn’t try very hard to learn spanish.

(please note: this is a joke, I don’t see anything about them getting passwords)

stevedidWHAT@lemmy.world · 2 years ago

Something to note here - with AI, if you’re using any sort of heuristic for your password, it’s pretty simple to work out a pretty good set of possibilities which makes brute force even easier and puts you at risk across the board.

Always come up with random passwords that are as random as possible. If there’s a path you took to get to a password, in theory it can be worked backward.

For example I know some people who only change a single letter when changing their passwords which is ultimately trivial to guess if the old password was compromised (hence the need to change the password or the need to proactively work against this possibility)

I_Has_A_Hat@lemmy.ml · 2 years ago

I wish more websites allowed random words as passwords instead of forcing numbers and special characters (but not THAT special character, you have to use one of the ones on this list).

People change their passwords by one letter or digit because they’re tied to these restrictive formats. If 5-6 random words was the norm, people would update more than just one character when needing to change passwords.

“poison navy series ruler handshake papaya” is a fantastic password.

“Ilovemygrandkids!123” is a horrible password.

hatter@lemmy.world · 2 years ago

Just use a password manager and a unique, long, random generated password for every site. There’s no need or reason to know the password to anything other than your password manager and your primary email.

deft@ttrpg.network · 2 years ago

in like a decade the use of a password manager will be a bad idea. i don’t know how but it will be.

demlet@lemmy.world · 2 years ago

Hmm, a single point of access for every password you have? I don’t see the problem…

SleveMcDichael@programming.dev · edit-2 2 years ago

The thing is the average person either can’t or can’t be bothered to remember even a dozen actually secure passwords, so they fall back to a couple of simple derivations of a common password, meaning each and every site a user signs up on represents an additional single point of failure.

demlet@lemmy.world · 2 years ago

That’s a good point.

Chriskmee@lemm.ee · 2 years ago

Lucky until we get actual quantum computing, it’s not worth the years on a supercomputer to crack a single stolen set of encrypted passwords.

danwardvs@sh.itjust.works · edit-2 2 years ago

You know somebody has to link this.

https://xkcd.com/936/

DragonTypeWyvern@literature.cafe · 2 years ago

That’s why I use IncorrectBatteryHorseStaple

They’ll never figure that one out

Cyberflunk@lemmy.world · 2 years ago

I see you. https://xkcd.com/936/

Rai@lemmy.dbzer0.com · 2 years ago

You just linked the same thing that the thing you responded to responded to had linked!

SokathHisEyesOpen@lemmy.ml · 2 years ago

You immediately know that they’re not handling your passwords correctly when they block certain characters.

stevedidWHAT@lemmy.world · 2 years ago

Agreed! I also think that the next steps would be getting rid of the need for users to even know their own password and instead replace with other securities like biometrics (with sufficient permutations possible to match or exceed passwords) and a physical device or something else entirely that removes the need to let the user in on what the exact password is

JJROKCZ@lemmy.world · 2 years ago

Tools like Bitwarden will let you fairly customize the randomly generated password it makes. You can tailor it to not use certain characters for those sites that don’t allow it. And each vault object can be customized like that independently so you don’t compromise all your passwords by not allowing _ or (, you can also have it do pass phrases like you gave an example of

lobut@lemmy.ca · 2 years ago

I use a heuristic to update my main passwords. It’s not a character but easily guessable if you see it in plaintext and now you’ve made me facepalm my actions.

I only use that for certain things because I use Google Oauth or Bitwarden for most things and you’ve just woken me up about what could be exposed.

stevedidWHAT@lemmy.world · 2 years ago

The goal should usually be as random as possible, if it’s got a series of steps to create, they can be traced backward

Now the trick I’m not telling you is that randomness is hard to get because you need a sufficient amount of entropy (basically just means randomness, chaos, formally it’s how much uncertainty there is in the system) to ensure that it’s strong enough which can be challenging sometimes. For example, if your password is only 3 characters long and has 10 possibilities for each spot in the string, you’re only looking at 10^3 possibilities to guess accurately which is nothing to pcs and people with time on their hands haha

qaz@lemmy.world · 2 years ago

That’s why I let Bitwarden generate a random 64 character password with special characters and numbers

fraydabson@sopuli.xyz · 2 years ago

I also take advantage of Bitwarden’s ‘passphrase’ generation as I understand that pass phrases can be even more secure.

If the password requirements allow longer passwords I typicallyuse a passphrase generated by bitwarden, shorter ones I use generated passwords.

SmoothLiquidation@lemmy.world · 2 years ago

The only thing that affects how long it takes to brute force a password is length and entropy (the different types of characters used). Passphrase is designed to make it easier for a human to remember, so if you are using a PM to remember it anyway, a 64 character random password is going to be better than a 64 character passphrase.

I usually use the password generator in the 32 character range with all of the symbols, numbers, and characters included, since it seems like a lot of places don’t like longer passwords.

fraydabson@sopuli.xyz · 2 years ago

wow thanks, I always remembering hearing people talk about passphrases being better, and saw bitwarden add a feature to generate them, I just went with it.

But given I have no interest in remembering these pass phrases, it would make sense to use generated passwords vs passphrases as you said. Good thing my effort to transition to pass phrases was recent and wasn’t done too much yet.

SmoothLiquidation@lemmy.world · 2 years ago

Hah, honestly either one is better than just having the same password on every site. You are all good.

Borkingheck@lemmy.world · 2 years ago

The rise of a pass phrase is more to do with mitigating the human risk in security which is people using memorable passwords. So a passphrase is typically easier to remember. That’s the theory anyway.

SmoothLiquidation@lemmy.world · 2 years ago

Hah, honestly either one is better than just having the same password on every site. You are all good.

fraydabson@sopuli.xyz · 2 years ago

something I did before letting bitwarden take over my passwords, was using a phrase consisting of 2-3 words + a series of numbers and special characters. Safer than anyone I knew at the time’s passwords. Admittedly it was not the most secure, as i only changed the beginning part of the 2-3 word phrase, and left the last word, numbers and symbols the same. So if one of those passwords were breached, it wouldn’t be too difficult for AI to brute force the missing pieces. So yeah I don’t do that anymore.

Tyler_Zoro@ttrpg.network · 2 years ago

redw04@lemmy.ca · 2 years ago

That’s why correcthorsebatterystaple is the best way to do passwords imo, just 4 random words with a random special character dividing them and a random number tacked onto the end. Good luck brute forcing that or using AI to guess 4 randomly generated words in the correct order.

stevedidWHAT@lemmy.world · edit-2 2 years ago

we were talking about password changes, not creation though

Guessing someone’s password with no prior history vs with an “averaged” prior history of the world/some large dataset are two different sized sets.

If you’ve got a feel for how the majority of people are changing their passwords, guessing those passwords is significantly easier when compared to traditional brute force

Edit:

Passphrases are fairly good too but I want to see some real word examples of AI trained on some password dumps to see how much better it performs in comparison to traditional brute force and through targeted info gathering. I’m curious to see if there’s any user friendly techniques that’d work against AI specifically and it’s ability to find patterns most people wouldn’t pick up on

CaptainAniki@lemmy.flight-crew.org · 2 years ago

deleted by creator

stevedidWHAT@lemmy.world · edit-2 2 years ago

I could explain in detaiI if that’s cool? I have a degree in computer Science and beginner level experience with making my own models from datasets (not my own but from various repos) and with other applications (LLM, image gen, and then some other miscellaneous stuff that’s not really important)

Otherwise I can try to find somethin for ya sure

CaptainAniki@lemmy.flight-crew.org · 2 years ago

deleted by creator

ZodiacSF1969@sh.itjust.works · 2 years ago

I want to see a citation on that too, in particular the AI part as though it makes ‘intuitive’ sense it’s good to see proof, but there’s no need to be a dick.

CaptainAniki@lemmy.flight-crew.org · 2 years ago

deleted by creator

stevedidWHAT@lemmy.world · 1 year ago

You can look it up yourself, you’re an asshole and I have nothing to prove to you. It’s common knowledge that AI is trained on data sets to find common patterns amongst a large variety of other techniques and specificities to predict as accurately as possible.

stevedidWHAT@lemmy.world · 1 year ago

deleted by creator

BubblyMango@lemmy.wtf · 2 years ago

Que pecado!

riodoro1@lemmy.world · 2 years ago

Next email from duo: give me your credit card details

AngryAnusHornets@lemmy.world · 2 years ago

deleted by creator

Alien Nathan Edward@lemm.ee · 2 years ago

“Mi Numero del Seguridad Social es…”

chulo_sinhatche@lemmy.world · 2 years ago

Do the people that release these get paid somehow? Or do they just do it for hacker cred and say fuck these 2.6M people?

Dasnap@lemmy.world · 2 years ago

In January 2023, someone was selling the scraped data of 2.6 million DuoLingo users on the now-shutdown Breached hacking forum for $1,500.

…

As first spotted by VX-Underground, the scraped 2.6 million user dataset was released yesterday on a new version of the Breached hacking forum for 8 site credits, worth only $2.13.

“Today I have uploaded the Duolingo Scrape for you to download, thanks for reading and enjoy!,” reads a post on the hacking forum.

Chariotwheel@kbin.social · 2 years ago

HODL, the value will go up again for sure

snorkbubs@fedia.io · 2 years ago

This part is also, ummm, interesting…

BleepingComputer has confirmed that this API is still openly available to anyone on the web, even after its abuse was reported to DuoLingo in January.

ChaoticNeutralCzech@feddit.de · 2 years ago

They’ll send fake emails where the green owl comes to collect “late fees” for your 216-day streak of missed Spanish lessons.

elvith@feddit.de · 2 years ago

We’ve been trying to reach you about your language course’s extended warranty…

no banana@lemmy.world · edit-2 2 years ago

You’ll have to pay with Bed Bath and Beyond gift cards.

scurry@lemmy.world · 2 years ago

The attackers are meme stock traders.

Kyrgizion@lemmy.world · 2 years ago

Both.

SpicaNucifera@lemm.ee · 2 years ago

Oh no, not my German and Japanese scores!!!

I guess the email could become a spam target?? Gmail does a good job sorting that for me.

themeatbridge@lemmy.world · 2 years ago

They know your email, your name, and that you’ve taken German anf Japanese. Next they use that information to craft a phishing email that only the very stupid would fall for, which fools an alarming number of people. Something like “Hi, this is Duolingo suppert, and your billing information may have been comprimised. Log into this portal with your credit card credentials to confirm that you were not affected.”

Alobarap@lemmy.world · 2 years ago

They’ll know my very poor scores :(.

no banana@lemmy.world · 2 years ago

Damn, they’ll know I didn’t finish that Spanish lesson the bird bothered me about!

Random Dent@lemmy.ml · 2 years ago

They’ll know I’m ~1800 days into French and still shit at it.

The shame!

JJROKCZ@lemmy.world · 2 years ago

Salut! Enchanté, ça va bien?

Random Dent@lemmy.ml · 2 years ago

Je vais bien, et vous?

JJROKCZ@lemmy.world · 2 years ago

Très bien!

RVGamer06@sh.itjust.works · 2 years ago

Hallo en dag!

deadsenator@lemmy.ca · 2 years ago

Bonjour!

That means “‘Sup?”

This is fine🔥🐶☕🔥@lemmy.world · 2 years ago

Bonjour means ‘what’s up?’

For ‘sup?’, you just say ‘jour’

CaptainAniki@lemmy.flight-crew.org · 2 years ago

deleted by creator

El Barto@lemmy.world · 2 years ago

I hope they don’t fucking send me spam.

art101@lemmy.world · 2 years ago

Depending on how far you got, you might not understand it anyway.

teft@startrek.website · edit-2 2 years ago

Quieres una gran verga? Haz click aquí!!!

no banana@lemmy.world · 2 years ago

Mucho dinero en tu futuro! USD$80,000,000,000 Euro!

grue@lemmy.ml · 2 years ago

That’s the thing that annoys me most about Duolingo: if they’re going to show you ads, the least they could do is show you ones in the language you’re trying to learn instead of your native one.

circuitfarmer@lemmy.sdf.org · edit-2 2 years ago

“Scraped” data suggests that it’s data available on public profile pages. However, the article also says the dump is a mix of public and non-public info. So which is it, scraped or not? It’s an important distinction, because data collection by scraping is technically not a breach.

ADTJ@feddit.uk · 2 years ago

Take this with a pinch of salt but what I’m gathering is that it’s essentially just taking people’s public profiles but the Duolingo api also exposes users’ e-mail addresses (and possibly other info) that isn’t normally displayed as part of the user’s public profile via their app.

In essence, they’re exposing more data than they probably should be and users were not really aware that data was being made public - that’s why people are upset about it.

circuitfarmer@lemmy.sdf.org · 2 years ago

Ok, this makes sense – in which case the API should not be exposing data that isn’t otherwise available on the public profile, so that is significant.

expatriado@lemmy.world · 2 years ago

estamos jodidos señor búo

Extras@lemmy.today · 2 years ago

I pray for whoever pisses off the duolingo bird

Destragras@kbin.social · 2 years ago

How is that API still up after this has happened?

AToM.exe@lemmy.world · 2 years ago

I only see this comment, but it says 53 comments. I just want to know why they didn’t tell their userbase.

stopthatgirl7@kbin.social · 2 years ago

Lemmy and kbin have been having some federation issues lately, which might be why you’re only seeing one comment.

WhyJiffie@sh.itjust.works · 2 years ago

Sometimes that happens for me too in the Liftoff app. But if I reload the comments with “swipe to refresh”, them all the others will appear too.

s1ndr0m3@lemmy.world · edit-2 2 years ago

I see the same thing. However if you go to the link to this post on kbin.social, you can see the other comments. It’s weird. https://kbin.social /m/technology@lemmy.world/t/371933 Edit: the hyperlink won’t display properly in this comment. You have to copy the whole link and paste it in your browser.

PlexSheep@feddit.de · 2 years ago

Is there a list on what data exactly got leaked, that wasn’t public before?

ansik@kbin.social · edit-2 2 years ago

However, Duolingo did not address the fact that email addresses were also listed in the data, which is not public information.

From the Article, emphasis by me

DragonTypeWyvern@literature.cafe · edit-2 2 years ago

Rip my email I use specifically for organizations I don’t trust

z4x15@lemmy.world · 2 years ago

I’m so glad I switched to duck email. Might as well changes it again and block the old email.

Unsustainable@lemmy.today · 2 years ago

DDG email is AMAZING! I only wish it would have been around before my email got exposed.

Fox@unilem.org · 2 years ago

Only one thing to do… Start over fresh.

I just did this a few months ago, and it feels really good to have a proper set-up now, with privacy respecting companies all around.

Tangent5280@lemmy.world · 2 years ago

What’s your setup? Which email would you recommend?

Fox@unilem.org · 1 year ago

Finding the right email provider is what took me the longest, really. Went over all the options multiple times, constantly finding new alternatives and adding them to my list.

There’s nothing right, and there’s nothing wrong when it comes to this. You’re gonna have to try out a few, and see what feels right for you.

You should take into account what’s the most important for you;

Lots of space?
Lots of aliases?
Custom domain support?
Clean user interface?

You’re probably gonna have to come to the realization that you will need to pay for it. You know, the old saying “If you’re not paying, you are the product”…

If privacy is your number one concern, you should check out these three options:

Protonmail
Tutanota
Skiff

Those are the ones that ended on my final list, and from those I chose Proton, mainly because I’ve used them for a long time already, and they have really good apps.

Tutanota is the more simple alternative, which is also the cheapest option. They recently changed their premium packages, but you can still buy the old ones using a small trick.

Skiff actually came after I already decided on Proton, and I’m not sure I’d have gone with Proton if I saw Skiff a bit earlier. Really looking like a great alternative, and they are offering enough in the free tier to be completely viable, even without a subscription.

To prevent spam, and protect your email, you need an aliasing service, and fortunately this is more simple, since there’s only 2 on the market;

AnonAddy (addy.io)
SimpleLogin

I went for AnonAddy, because of the price and it being independent. You can get SimpleLogin included with the expensive Proton subscription, but I’m not really prepared to spend 10 bucks a month for email.

My setup is to use a unique alias for every single website. These aliases are generated through addy.io, using my custom domain. That way I can easily toggle off an address, if spam starts coming in, but I can also change provider to for example SimpleLogin, if anything happens with addy.io.

That’s just my setup, which I understand can seem a bit complicated to some, but it gives me the freedom, security, and peace of mind that I’m looking for.

Unsustainable@lemmy.today · 2 years ago

I’m in the process of doing that. It’s not a quick and easy process. I was so lazy with password, that I would just use a variation of 3 different passwords for everything because that’s all I could remember. Then I had a password exposed, so I decided to change all my passwords to unique passwords and use a password manager. I was shocked to see that I had 126 passwords saved in my browser. That took a long time to go through and change the email and passwords to everything.

Fox@unilem.org · 1 year ago

I totally understand you. It’s indeed quite an enormous task.

I’ve been on the internet for 20 years, and just like you I used to use the same username, email, and password for everything. Slight variations for some stuff, but generally the same.

A couple of years ago, I took the problem by the root, and went into my browser’s saved passwords, which I’ve migrates over a couple of times from Chrome and Firefox.

I was shocked to find over 1200 unique entries, scattered over so many sites. Many of them I did not even recognize.

I took it from the top and went down the list. Every site I would open, sign in to, and then change the password. If the site did not exist any longer, I would just remove it. If the site looked spammy, I would delete my account. If I couldn’t find a way to delete my account, I would change the email.

Took me around 3 days to get through the list. There were quite a few duplicates, but also many that just didn’t exist. Ended up with around 500 entries, which I then exported and out into my new password manager Bitwarden.

It feels good getting it done, but I also understand it’s a daunting task. To me it was fun to relieve some of the memories from my childhood.

stevedidWHAT@lemmy.world · 2 years ago

I’m really confused on what you’re responding to

qaz@lemmy.world · edit-2 2 years ago

Why did you choose DDG mail over Addy?

Unsustainable@lemmy.today · 2 years ago

Because this is the first time I’ve heard of Addy. I saw your comment and downloaded the app from F-Droid. I’ll see how it compares to DDG. Do you know of any advantages over DDG?

qaz@lemmy.world · 2 years ago

https://youtu.be/-nplKmsqozA?si=8Y9Xpvzdltr0QqH8

PipedLinkBot@feddit.rocks · 2 years ago

Here is an alternative Piped link(s): https://piped.video/-nplKmsqozA?si=8Y9Xpvzdltr0QqH8

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I’m open-source, check me out at GitHub.

stevedidWHAT@lemmy.world · edit-2 2 years ago

Live learn and share right! 😉

Destide@feddit.uk · 2 years ago

oh non!