Hey lemmings!

I wanted to share a quick update about our recent performance issues and how I have addressed them.

The last 24h have been a bit rough for lemm.ee.

Last night, I spent some time debugging federation issues with lemmy.world. We managed to significantly improve the situation - lemmy.world content is now reaching lemm.ee with a very high success rate - but this has had the effect of increasing incoming federation traffic on our servers significantly.

Additionally, we have been seeing steadily increasing normal user traffic over the past week, which is awesome from a community standpoint, but of course means that our servers have to do more work to keep up with all the new people.

To top things off, today there appeared a badly configured instance in the network, which was effectively launching a DoS attack against lemm.ee for several hours. Most likely it was unintentional, but unfortunately the end result was a sudden increase in our server load.

All these factors combined resulted in a really bad experience for most lemm.ee users today. Page load times have consistently been spiking into as much as 10 seconds or more for the whole day:

In fact, a lot of page loads just timed out with errors.

Fortunately, it seems I have managed to clear up the problems!

I have put a bunch of mitigations in place, and after monitoring the situation for the past hour, it seems that our performance issues have been resolved for now. So hopefully, you can enjoy browsing lemm.ee again without it feeling like torture!

Here are specific steps I took:

  • I have doubled the hardware resources for our backend servers and database.
  • I purchased a Cloudflare pro subscription for lemm.ee for 1 year. This took out a considerable chunk of my budget for lemm.ee, but in return it will allow me to analyze and optimize our cache usage to a far greater extent. I am already seeing vastly reduced load times for cacheable content (try opening https://lemm.ee a few times in a row as a logged out user - it should be blazing fast now!)
  • I have configured a rate limiter which will prevent future DoS from the specific method that was used against us today.

Of course, all of the above is costly. Luckily, lemm.ee users have been very generous with donations in the month of June, and in fact a significant amount of donors have opted for monthly recurring contributions. This all gives me the confidence to increase our spending for now, and I am currently expecting to NOT increase my personal planned contribution of 150€/month, as the increased costs so far are entirely being covered by donations!

Let me take this opportunity to thank the sponsors who made the upgrades possible! All lemm.ee users are now enjoying better performance thanks to you, I could not have done it without you awesome people.

On a final note, I just want to say that I hope a lot of these issues can be solved by optimizations in Lemmy software itself in the future. I have been personally contributing several optimizations to the Lemmy codebase, and I know many others are focused on optimizations as well. Just throwing extra resources at the problem will probably not be a sustainable solution for very long 😅. But I am optimistic that we are moving in the right direction with the software changes, and we’ll be enjoying reduced resource needs before long.

That’s all I wanted to share today, I wish you all a great weekend!

  • Mogofwin@lemm.ee
    link
    fedilink
    English
    arrow-up
    3
    ·
    1 year ago

    What a phenomenal response. Every single day makes me happy that I chose lemme.ee as my home instance. Truly appreciate all of the hard work you are pouring into this. And we can see how difficult it is based off of your total transparency. Thank you!

  • Thurstylark@lemm.ee
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 year ago

    Yo, I’m a simple Reddit refugee, just trying to figure out how to make my way in the fediverse, and I signed up to lemm.ee not long after this post went up. I honestly chose this instance on a whim, and after a bit of exploring and learning about how Lemmy works, boy does this post make me glad I landed here!

    Thanks for running this instance, and for housing us reddit noobs :)

  • Atiran@lemm.ee
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 year ago

    Excellent work. Thanks for all that you do to run this fabulous instance!

  • Beaupedia@lemm.ee
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    1 year ago

    I’m brand new, this is my first comment. Thanks for your work! Where can we donate to this instance?

  • pascal@lemm.ee
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    To this day I have still lots of “subscribe pending” in my communities options page, especially from lemmy.ml and lemmy.world

    Should I try to cancel them and redo or just wait?

    https://imgur.com/a/rJEH1Di

    (I cannot upload images anymore, I get a JSON error now)

    • sunaurus@lemm.eeOPM
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      You should be able to cancel and retry lemmy.world - there’s a high chance those will go through now.

      With lemmy.ml, there’s a much lower chance, better to wait until they upgrade to 0.18.1.

  • lol@lemm.ee
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I jumped from another instance, this one loads so much faster and has more accurate numbers on communities from other instances. Really cool stuff.

    • EeeDawg101@lemm.ee
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I just joined up too after having lots of issues on lemmy.world, which is not surprising with how many people are flooding over. Someone commented about lemm.ee and I’m also really liking it. The performance difference is huge! I also love how the owner/dev talks about the instance. Seems like this will be a good one to stick with and use as primary.

      I am curious about membership counts on communities though. On other instances I’ll see a community member count of around 1000 and on here, it shows the (same community) is nowhere near that high. Are you saying that the lower number is more accurate? Or maybe it’s a syncing issue and with all the signups happening the numbers just haven’t had a chance to get caught up?

      • thegiddystitcher@lemm.ee
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        If you’re viewing a community through an instance that isn’t the one it’s actually hosted on, you’ll see a lower subscriber count because as far as I’m aware it’s showing the number of subscribers from the instance you’re viewing it on rather than the total.

        • xavier666@lemm.ee
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          as far as I’m aware it’s showing the number of subscribers from the instance you’re viewing it on rather than the total

          That explains all the weird numbers

        • johnofthesea@lemm.ee
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          showing the number of subscribers from the instance you’re viewing it on rather than the total

          I know this is not priority now, but it would be cool if it showed both.

      • lol@lemm.ee
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        The instance its hosted on is probably the most accurate. The reason I said this one was more accurate is because the old one I used had many posts without any comments and some posts were missing completely while this was showing most if not all of them. It will probably never be 100% synced due how lemmy works I guess.

    • rm_dash_r_star@lemm.ee
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Yeah @sunaurus@lemm.ee is really optimizing the hell out of this instance. A driver for me right now is he’s running the release candidate for 0.18.1 and it’s a huge improvement for me over 0.18.0. He’s also a dev on the project so he’s getting fixes in as well.

  • db0@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Can you tell me what you’ve cached? I’m not using cloudflare but I am using haproxy which has frontend caching builtin. It was next on my plan but if you share your caching setup I can try to replicate it on lemmy.dbzer0.com

    • sunaurus@lemm.eeOPM
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Mostly all images are served through a cache. I would like to also cache some static HTML (such as pages for unauthenticated users), but it breaks due to some users requesting these pages with an Accept header for an activitystream content-type, and I haven’t had time to figure out a solution for accounting for the content type in my cache key unfortunately 😅. But if you can do that easily in your cache then for sure you could also cache any static pages for a minute or so.

      • db0@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Ye I can cache differently depending on headers. Surprised that caching images helps a lot since your pictrs is hosted in an independent box anyway

  • tryagain@lemm.ee
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Alrighty, I had a browse around and I’ve settled on lemm.ee + Jerboa and it’s looking good. Thanks for all your hard work handling the influx of reddfugees like myself. This gives me hope ❤️

  • Spzi@lemm.ee
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Thanks for taking care, making upgrades, monitoring, fixing, contributing, and informing us so thoroughly.

    Also thanks to all the donors! This is all great to hear 😊

  • Navarian@lemm.ee
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Been browsing for a few days, decided to set up shop here, so to speak, upon seeing this.

    Great work with this.

    • Rannoch@lemm.ee
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      Agreed! I was hopping around trying to figure out how to choose an instance (and only somewhat understanding how the site works so far lol), but these super transparent posts and the effort clearly being put in to keep things running smoothly + be welcoming to reddit “refugees” is what made it easy to pick this instance to sign up on first! Thank you for all the hard work!! :)

  • FarLine99@lemm.ee
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    Awesome instance. Really fast speed (compared to sh.itjust.works), there is no blocked instances/communities. Kudos❤️

    • sunaurus@lemm.eeOPM
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      1 year ago

      I know what you mean! The good news is that there are some huge improvements for federation in 0.18.1. These improvements depend on instances at both ends being on 0.18.1, so we’ll start seeing it kick in shortly as more of the network upgrades.

      • LettuceTurnipTheBeet@lemm.ee
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Is there a way to manually trigger a sync or so?

        Seeing other reply to you on another instance, and not being able to respond because those replies aren’t on lemm.ee is very frustrating.

        • sunaurus@lemm.eeOPM
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          There is: if your search for the full URL of the source post or comment on lemm.ee, then a sync will be attempted.

  • xavier666@lemm.ee
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    How much of the slowdown was caused by the bad instance VS the limitations of the previous hardware?

    • sunaurus@lemm.eeOPM
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      The DoS was responsible for about 10-20% increased load on our system - it wasn’t the root cause of the slowdowns, it was more like a nice cherry on top of the cake 😅 The bigger issue is the constantly increasing federation load.