PSA: Pictures are back!

Lionir [he/him]@beehaw.org · edit-2 1 year ago

PSA: Pictures are back!

PenguinCoder@beehaw.org · 1 year ago

Well that was fun.

Didn’t go as planned of course, restored from backups, pre migration attempt. Thank you for your patience while we try to get all these moving parts working well together. Sorry for the troubles.

The Cuuuuube@beehaw.org · 1 year ago

I once caused an AWS outage that impacted 20% of their customers in their largest region. They called my manager to ask why we were performing around 10k writes per second to a bucket. It was fun times

Huggernaut@beehaw.org · 1 year ago

They don’t limit that?! I’ve worked with a lot of AWS services and most have built in rate limits. That’s wild lol

The Cuuuuube@beehaw.org · 1 year ago

They do now…

longshaden@beehaw.org · 1 year ago

lol, that’s how rules get made

The Cuuuuube@beehaw.org · 1 year ago

I can get into it in more detail if anyone’s interested. But basically, they had a rate limit on direct writes, but not a rate limit on cross bucket replication if you connected many buckets to replicate into a single bucket

mitch@beehaw.org · 1 year ago

That was you?!

(jk, I’m on a different cloud 😂)

Artemisia@beehaw.org · 1 year ago

Loads of love. There’s always ASCII art.

l4sgc@beehaw.org · 1 year ago

Thanks for the update and hope you have less trouble in the future! Don’t worry about the downtime I really appreciate that here it’s serving a clear purpose unlike Twitter lol

bassdruminphonebox@beehaw.org · 1 year ago

I appreciate the late night efforts and the clear communication. For me, Beehaw is a positive place I can visit, but there are other things I can do also, and I have no need for many 9s of uptime here. (I’m trying to reduce any pressure you & others might feel - perhaps not communicated it well tho, hence this addition.)

Chris Remington@beehaw.org · 1 year ago

I can’t upload my puppy and flower pics!!! Fucking damn you!!! WTF did I sign up for!!!?!?!??!

PenguinCoder@beehaw.org · 1 year ago

You can always host your own instance…

duck

jabib (he/him)@beehaw.org · 1 year ago

<bender.jpg> Caption: I’ll just make my own Beehaw - with blackjack and hookers!

knittedmushroom@beehaw.org · 1 year ago

Every technical bump in the road we hit now is one we won’t hit/will know how to handle quickly in the future! Thank you for doing what you do for Beehaw!

Lionir [he/him]@beehaw.org · 1 year ago

Yeah, moving to object storage is best to do now. Arguably, we should’ve done it sooner since the longer we’ve waited, the more it was gonna catch up to us and cost us in time and money.

interolivary@beehaw.org · 1 year ago

I’d imagine the list of things you should do RIGHT NAO is pretty long though and there’s only 24h per day 😅

AndrewZabar@beehaw.org · 1 year ago

I concur. A minor inconvenience on occasion is a small price to pay for your amazing efforts! Thank you for doing what you do.

PenguinCoder@beehaw.org · edit-2 1 year ago

dr_catman@beehaw.org · 1 year ago

Thank you for making it possible to share endless pictures of beans in the future! It will never get old.

Beans, beans, beans, more beans, perhaps a cat, beans, beans, never gets old!, beans.

_MusicJunkie@beehaw.org · 1 year ago

Are you Beanus from digitiser?

alehel@beehaw.org · 1 year ago

Surprised beehaw hosts images at all. Sounds like that could become very expensive very quickly.

douglasg14b@beehaw.org · edit-2 1 year ago

It could, and will. Hopefully they are taking advantage of CDNs for image delivery so they aren’t paying high egress costs and can keep it in slow, cheap, storage.

I’m honestly surprised that Lemmy hasn’t embraced distributed, community, hosting. Many existing niche communities (outside of Lemmy) operate with the ability for others to run their service to serve up images and media, or to act as workers (By running the worker application or container) for computationally expensive operations like compression or encoding. Even gamificating it in the case of e-hentai.

Hard drives at home are incredibly cheap compared to cloud storage costs (even including networking, server, redundancy…etc hardware costs), but come with reliability concerns, which is where a distributed community becomes critical.

greenskye@beehaw.org · 1 year ago

I feel like Lemmy definitely needs to embrace distributed computing in some fashion. I have no interest in hosting my own instance, but I’m not against running a docker image that would offload some of the processing requirements large instances need. It would just need to be relatively straightforward for me to setup

The Cuuuuube@beehaw.org · 1 year ago

Distributed computing isn’t really a good fit for low computational tasks like forum software. It’s good for heavy calculations like “Could you please fold proteins to see if there’s any interesting stuff to be found” and “Here are 50 years of radio data. See if any of it is anomalous.” You need a sufficiently complex enough long-running task to warrant the computational overhead of a supervisor process assigning and receiving the outputs of tasks. LLMs, epigenetics, and deep space analysis are all good candidates for distributed computing. Lemmy is more of a candidate for an autoscaling clustered multi-tennent approach. The computational tasks are basic, but there’s a lot of them. Further, the computational needs are not constant. A fantastic case study for making the most of resources in the Fediverse is mastodon.world and lemmy.world running on the same server and making scale up and scale down requests to the docker daemon. The ideal world topology, in my opinion, for a Fediverse application ecosystem would be a Kubernetes cluster with three supervisor nodes and a minimum of two worker nodes, all with autoscaling enabled. The idea would be that your database resources can hold multiple databases (Lemmy, Mastodon, Peertube) AND can scale. The mechanism you would use to do this would depend on your hosting decisions.

Digression now on database solutions. There are three basic ways I could see running the perfect Fediverse database cluster. The first, and least beholden to any given cloud provider, is to run Postgres in a Kubernetes cluster either on a single machine emulated cluster at your house, or within several clustered machines. The upside to this is that no one but you controls your infrastructure. The downside is that your ability to scale is hard capped to the amount of RAM and CPU resources you physically have in your house. Next would be a similar set-up on a hosted Kubernetes cluster through a cloud provider such as Google, Microsoft, IBM, or AWS. The downside here is that tech giants are all, for various reasons, shit. Google has the best eco-friendliness score, so they’re listed first. They’re still shit, though, and one of the platforms I’m suggesting hosting is a direct competitor to one of their golden goose products.

Your next option is to just pay one of those cloud providers to host a database cluster for you, rather than using an ad hoc Kubernetes cluster solution. It will cost you more money, but the tools available to you for managing databases through these cloud providers are much better. In terms of user experience and performance, this is a clear upgrade over hosting your databases on your Kubernetes cluster. The final option I’d want to talk about is called “Aurora Serverless.” So far, I’ve only discussed ways you can scale up to meet demand, but Aurora Serverless allows you to scale down. This will be the cheapest option if you run a small instance with clear peaks and valleys of load. It’s not the best answer for a user like Beehaw, but would come with the lowest cost in terms of management and money for someone running an instance for a low number of people.

So, does that solve the image hosting problem? No. Not really. Postgres is TERRIBLE for image hosting. Right now, Beehaw is, per my understanding, using the simplest image storing solution, which is “Just keep it on the server.” This is great for a first pass at hosting a web service, and will remain fine long term for a low user instance, but will fast run into issues with any instance that hosts numerous users uploading pictures. Basically, servers have finite space because they’re running the Harvard architecture. The only solution is to bring the service down and put in bigger disks. Eventually, you reach the upper limit of how big of disks are manufactured, and how many disks you can attach via the interfaces that connect to a motherboard. A much better solution, and in fact the best solution, is what Beehaw is implementing right now: block object storage. If I’m going to tie all of this first in the DIY “I’m a strong independent Fediverse citizen, and I don’t need no corporations,” I’ll start by recommending Ceph. Ceph can run on Kubernetes and will provide block object storage based on Kubernetes persistent volumes. But more likely, you will want to aim for something with infinite storage capabilities, and your only real options for that currently are the cloud providers. You don’t have to worry about disks running out of space, and they do not charge you very much money.

I get where you’re coming from, though. “How do we all own the images so that the instances don’t run out of space but without being beholden to the corporations who own the storage?” The closest we come right now is peer 2 peer solutions, but all of them have a discovery and durability problem. In terms of discovery, the problem is “how does a server providing the Lemmy service find the peer 2 peer hosted files?” There’s no way to perform get object operations to serve the files via HTTP other than for the host server to fetch (download) the file from the peer 2 peer network and then deliver that to the user who made the request. The problem with this is that the server synced the file to its local storage, and is now hosting it, thus defeating the purpose of the peer 2 peer hosting solution. The other problem, the durability problem, is what happens when a low number of people are interested in an image, and the last person online hosting the image closes their laptop. Now no one can get the image as there was never a canonically available version of the file. The only solutions that I know of that come close to solving these problems right now are Nostr and Secure Scuttlebutt. There are major issues with these protocols as they stand right now. Firstly, people already find joining the Fediverse too hard. For Nostr you have to generate GPG keys to create your identity. This isn’t… horrible, but it definitely takes some work and some doing. You have to generate the files and then load them into your Nostr client. Secure Scuttlebutt is based on a protocol where to follow someone, someone has to invite you to follow them. People already complain about Beehaw asking you a question about what you like about Beehaw to make sure you read the rules. Imagine the frustration with a pure invite only social network where you can’t join until someone you know has joined.

The second problem is moderation. Secure Scuttlebutt is fine for this. You only ever follow people you like, you only ever see updates from people you like. Fantastic. Nostr has basically no moderation at all. If you’ve spent any time at all on the internet, you’ve probably realized by now that this is TERRIBLE. My time on Nostr was basically opening the app, seeing an entire feed full of pro-Russian propaganda, and then uninstalling it. I do think there’s something to be said for the idea of a pure peer 2 peer social network, but I don’t think we’re anywhere close to implementing it yet. So, where does that leave us?

The Fediverse. It was designed for a distributed governance system in which each instance acts as its own country with its own rules and governance, and it accidentally has some pretty neat clustering features that help it perform better under heavy load and keep data more permanent and durable. I want to emphasize that, too. The current computational and architectural benefits of the Fediverse are accidental. They’re side effects of the distributed governance, not the core purpose. I don’t expect anyone to put focus into enhancing these aspects of the Fediverse, at least not for a while. We’re much more likely to see someone design a community based social network from the ground up on peer to peer technologies. I’d be excited about that, but it will need to have more open signups than Secure Scuttlebutt, and moderation tools like… At all, unlike Nostr. The most likely solution for the latter would be collaborative blocklists. Maybe me and two of my friends have a shared view of what is and isn’t hate speech. So, we all spend some time just blocking the shit out of users. But, no one of us is who writes the block list, the block list itself is a peer 2 peer distributed construct so that we don’t all have to reach consensuses about “Hey, was this guy being a jerkass”

interolivary@beehaw.org · 1 year ago

Lemmy definitely needs to embrace distributed computing in some fashion

It would just need to be relatively straightforward for me to setup

Pick one.

douglasg14b@beehaw.org · edit-2 1 year ago

It can be though? Sites & service have been doing this for decades now. My example of e-hentai using distributed workers hosted on users machines (given they pass the networking & storage requirements) to serve up images is one of those.

The problem is the bulk of the work is on Lemmy developers to design such a solution, and then together with the FOSS community, make it accessable.

Media is the low hanging fruit, and has largely been a solved problem for quite some time. And even has semi-functional fediverse solutions. Distributed workers for encoding and compressing media is also a solved problem. And in many cases has been made as easy as downloading an executable or spinning up s docker container.

So, yes, for a set of workloads you don’t have to choose. And haven’t had to for years.

Actual distributed transactional workloads is a whole other beast, which is a problem that needs solving if we ever want to have robust and survivable communities that can deal with scaling issues without risk of dying because of a lack of funds or because someone ran off with the funds.

Lionir [he/him]@beehaw.org · 1 year ago

We’ll definitely be using a CDN to help avoid high egress costs.

Retronautickz@beehaw.org · 1 year ago

If it fails, you can always tell users to upload images to pixelfed and share the link here (I’m joking, don’t take this seriously)

TemporalSoup@beehaw.org · 1 year ago

Maybe to Gifycat? It’s like a nice short-term storage

pixelpop3@beehaw.org · 1 year ago

Gfycat announced they are shutting down and deleting everything on September 1st.

jherazob@beehaw.org · 1 year ago

That was the joke :P

Gormadt@beehaw.org · 1 year ago

Wait really?

Damn that really sucks

Pete Hahnloser@beehaw.org · 1 year ago

Thanks for everything y’all do to keep Beehaw afloat!

average650@beehaw.org · 1 year ago

Just to be clear, this is just a moving of images, and it will be back correct? Just a temporary measure?

interolivary@beehaw.org · 1 year ago

Yep, they’re moving pictures to a service where it’s cheaper to store them rather than keeping them on the server’s hard drive

cyberdecker@beehaw.org · 1 year ago

No worries on the short notice, thank you for the heads up! Sincerely appreciate the transparency.

trekz@beehaw.org · edit-2 1 year ago

Guess you guys will have to wait a little longer to see my grandma in her latest night gowns…

suburBeebiTcH@beehaw.org · 1 year ago

ok we ready, bring out grams!

dandelion@beehaw.org · 1 year ago

You guys are the best!

I did an ADHD, and misread as you saying you were turning off pictures for good, but given how much I’m enjoying the Beehaw community and the hard work you guys to keep it online, I wasn’t even that upset about that! A short, well telegraphed, partial outage is nothing in comparison!

Thanks to all you wonderful people!

metaltoilet@beehaw.org · edit-2 1 year ago

https://postimages.org/ :)

Lionir [he/him]@beehaw.org · 1 year ago

But then I have to click links!

metaltoilet@beehaw.org · 1 year ago

There’s a direct embed feature. I’ve used it for everything i’ve posted to reduce load on the servers.

Lionir [he/him]@beehaw.org · 1 year ago

Well, we do proxy the image so it’s just saving on storage costs which after this move will be very cheap. 5$/TB/month.

Good to know for now though :)

psudo@beehaw.org · 1 year ago

Which object store did you go with for that price? It’s been awhile since I looked, but I remember them being more than that.

Lionir [he/him]@beehaw.org · 1 year ago

We’ve chosen Backblaze B2, it’s one of the cheaper options. Wasabi has similar pricing.

TheOtherJake@beehaw.org · 1 year ago

They will ban your IP if you direct link to images and bypass their tracker links. If you use a VPN, be sure to clear your cache before changing your IP or the ban may carry over.

metaltoilet@beehaw.org · 1 year ago

How come they give a direct embed link then?

suburBeebiTcH@beehaw.org · 1 year ago

soooooooo… is it done? did it work?

Lionir [he/him]@beehaw.org · 1 year ago

It’s still going… it’s very slow.

Faeillus@beehaw.org · 1 year ago

I am utterly ignoranant of technological mumbojumbo. I was just trying to add a pic to the Creative sub, but nothing uploaded. Is that why? Can I stop trying to make it work?

AnarchoYeasty@beehaw.org · 1 year ago

Yes. Upload it somewhere else and link to it in your post.

fisk@beehaw.org · edit-2 1 year ago

Yes. Be patient. Assume that when Fediverse stuff is not working exactly right, it’s not you, it’s probably the Fediverse. These are early days of self-organized effort, like thousands of people trying to lash rafts and boats together in the middle of the ocean. They’re busy trying to make sure the whole thing doesn’t sink - don’t worry about the photos.

With kindness, I very much suggest against dismissing both the technology and your ability to understand it by calling it “mumbojumbo”. Don’t let the engineers make this stuff something only they can understand and work with.