A Rebuttal to Scaling Mastodon is Impossible

Posted on Nov 14, 2022 ( ~18 Minutes / 4593 Words )

Armin Ronacher wrote that Scaling Mastodon is Impossible

I'd like to offer a rebuttal. As someone who's been doing professional web development since 1995, with most of that time being spent in Rails jobs, or doing Rails work on the sidelines, I think i have a pretty good perspective on the situation. For those who don't know, Mastodon is written in Ruby on Rails.

Decentralization promotes an utopian view of the world that I belief fails to address actual real problems in practice.

Email. E. Fucking. Mail.

Email serves a very real problem. It has been decentralized (federated) since the start, and most people would agree that the following are all important features of email, even if they don't personally take advantage of them

being able to have an email address at your domain / your company's domain.
being able to have email that isn't being mined by a company to show you advertisements
being able to add as many accounts as you want without having to pay more or register more
being able to create aliases for your email address
being able to create "catch-all" email addresses. E.g. <anything>@example.com
not having to pay arbitrary additional fees for storage, or anything else, because some company decided to make more money off of you.

Decentralization makes all of these possible.

Yet on that decentralization wave a lot of projects are riding from crypto-currencies [1], defi or things such as Mastodon. All of these things have one thing in common: distrust. Some movements come from the distrust of governments or taxation, others come from the distrust of central services.

I think that's a significant misunderstanding of why Mastodon exists. Mastodon's early days were supported by the work, and participation of lots of marginalized people (mostly queer folk of various flavors). You wouldn't be wrong to say that it was created because of "distrust", but if so you'd be arguing that it was a distrust of the centralized platforms ability, or their unwillingness to protect marginalized communities, which is completely true.

Mastodon servers are a collection of communities, we do what no centralized system can do. We protect ourselves. There are simple too many bad actors in the world for any centralized server to moderate them effectively. On top of that you have the differences in "acceptable" between different social groups. No single corporate / non-profit / human entity can possibly hope to fairly enforce all acceptable social practices across all social groups.

Even if they could, it's a brutal job that decimates the mental health of the folks that have to do it every day. I've never heard of a case where Twitter, or Facebook has actually worked to protect them or compensate for the horrible things they're forced to see.

In my mind the discussion about centralization and decentralization completely misses the point of the intended outcomes. Centralization or decentralization should really be an implementation detail of the solution to an actual problem.

Agreed. There's way too much "it's federated" talk when convincing people to join Mastodon. The "federation" part is not only something your average person doesn't understand, but also something they don't care about. Average users don't care that email is federated. I've been using it for decades and I've never seen an email host promote the fact that it's "federated" as a reason to use it.

What are we trying to solve?

Let's ignore Twitter for a second and let's talk about software engineering. Specifically dependency management. I think dependency management is an interesting proxy for the problem here and there are some lessons to be learned from it.

Wait. What?!

I've thought long and hard about dependency but uh… I'm not seeing this connection.

The first concrete point they seem to make is this:

…as time went on, a lot of these packages [people depended on] went away because the hosts they were hosted on shut down. So the first cracks that showed up just was an effect of things ageing. People walk away of projects, in some cases die and with that, their server bills go unpaid and domains eventually lapse. Some companies also go out of business. SourceForge did not really ever die, but they had financial challenges and made their hosting page ever more hostile for the installers to give access to the uploaded tarballs.

Sounds like a strong argument against centralization, and a good argument for taking personal control of the things that you rely on, and are important to you. Or, to put it another way, don't rely on others when it's something you care about. A centralized system is not something you control, so if this is important to you, you can't rely on centralized systems.

The second thing that became apparent over time was also that decentralized services came with a lot of security risks. Every one of those hosts allowed the re-publishing of already existing packages. Domains that lapsed could be re-registered by other people and new packages could be placed there.

Have you heard of a centralized system that hasn't had a security issue? Have you heard of a centralized system where someone's account wasn't stolen? Either hacked, or given away by the company because they believed someone else deserved it more or had a better claim to it.

Most of the major centralized platforms have had issues with certificates expiring, or forgetting to renew some domain. So… I'm not sure how this is relevant to the discussion. Everyone is provably bad at this. When a centralized system screws it up it affects millions. If I screw it up on my Mastodon server 8 users are affected.

Obviously there are nuances here and it's clear that central services come with risks, but so do decentralized services and they don't have clear upsides.

Clear upsides:

you control your destiny
you account doesn't exist at the whims of an algorithm
you can protect your friends
you don't have to do anything if some company goes out of business
if your dependencies suddenly come with requirements / expectations that you don't agree with you can change them.
- In the case of Twitter (for example) you don't have to worry about terms of service that are 10 meters long, read by no-one, bind you to things you don't agree to, and change regularly.
- In the case of open source libraries changing their license, you can either not upgrade, or switch to a different package, or fork it and maintain the old license.

On decentralized systems in particular I encourage you to read Moxie's take on web3 which outlines the challenges of this much better than I ever could. In particular it makes two very important points, namely that people don't like self hosting (at scale) and that it's easier to move platforms than (decentralized) protocols.

Yes, people don't like self hosting. Yes, it's easier to move platforms than to invent and implement a new protocol. Neither of these are relevant. They're straw-man arguments.

The vast, vast majority of people participating in the Fediverse aren't self-hosting.
No-one needs to invent and implement a protocol. It's already been done.

There is also a proxy war going on about freedom of speech and expression and the desire to create safe spaces.

Sure. That's happening everywhere. However, the centralized platforms fail at both. They can't have true free speech because it is fucking dangerous, and their advertisers don't want to spend there money in a place that is supporting Nazis. They have utterly failed at creating safe spaces for anyone

So really before we talk about centralization and decentralization, I think we actually need to understand what we want to accomplish. And really I think this is where we likely already disagree tremendously. Mastodon encourages not just decentralization, but federation. You can pick your own mastodon server but you can also communicate with people on other instances. I will make the point that this is the root of the issue here.

I failed to see what exactly he was ultimately saying was the root problem but…

I have been using this for a few weeks now in different ways and it's pretty clear that this thing is incredibly brittle. The ActivityPub is a pretty messy protocol, and it also appears to not have been written with scalability in mind much. The thing does not scale to the number of users it currently has and there is probably no trivial way to fix it up.

The protocol is, provably, scaling just fine. The thing that's failing to scale is the Mastodon server implementation. It's written in Rails which has a horizontal scaling strategy (throw more servers at it) and depends on Sidekiq for background processing of messages which is… let's just say folks have differing opinions of its ability to scale, but regardless of what you think about that, there are other options for managing message queues that have existed for decades, and are provably more efficient and scale amazingly well. Scaling servers horizontally is something that requires significant sysadmin know-how.

On top of this is the fact that literally every social networking server for the Fediverse (Mastodon, Pleroma, etc) are all designed for "communities". A server of tens, or hundreds of thousands of users is not a community. This greatly exceeds Dunbar's Number of roughly 150 individuals. It's literally too large of a group for humans to maintain stable social relationships with.

So, even if we set aside the technical issues, it's not good at that scale because it's not supposed to be good at that scale. When you start getting thousands of people who are following tens or hundreds of thousands of other people you loose the ability to moderate effectively. You can't protect your community because there are too many messages coming in from too many sources. You end up needing moderation teams, and the bigger you get the more impossible the task becomes. Except it's worse, because Fediverse instances are managed by volunteers not corporations. No-one's being paid to validate reports of abuse, mediate disputes, or look at cruel racist bile.

Communities don't scale. It's a human problem not a technology problem. This doesn't mean that decentralization isn't a good solution to the technological problems, and this doesn't mean that decentralization as a technical solution doesn't scale. Email disproves this trivially. It works great, even under the ridiculous onslaught of billions of automated spam emails every day, decentralization works.

… there is the belief that you can somehow create a coherent experience into a “whatever”. Whatever it is actually. My first mastodon instance was de-federated by accident from my current instance.

That sucks, but it's not like centralized solutions don't regularly delete or suspect accounts by accident, so I'm not sure how it's relevant to the discussion. If anything it's another argument for community based administration because these mistakes happen far less often at the individual instance scale than it does at Google / Twitter / Facebook scale. Plus, there's no "an algorithm decided I was bad and suspended my account". It's literally not a thing. Sure, it could be but it isn't now, and you would never have to join a server that implements that crap to participate in the Fediverse.

I moved to that instance though because many other hackers in the Open Source space did, and unlike Fosstodon it seems to allow non English content which I do care about quite a bit. (After all my life and household is multilingual and I don't live in an English speaking country.) Yet that instance still defederates qoto and I'm guessing because qoto permits unpopular opinions and does not block servers itself.

So first you found a community that wasn't a good fit. Then you found one that was. Congratulations. Now you're asserting that your old one is blocked (not actually defederated) by your current one because of unpopular opinions. This means your community is successfully enforcing its values and protecting its members. Congratulations. The ability to protect your community is a feature, not a bug. I should not be forced to see crap from people I disagree with because some company doesn't have the cajones to block them because of some BS argument about "free speech", or because they don't have the human resources to investigate the BS.

Federation makes all of these questions play out chaotically and there is no consistency. My first experience of being on Mastodon was in fact that I got shitposted at by accounts on poa.st. The n-word was thrown at me within hours of signed up. Why? I'm not sure. So moderation is something of an issue.

So, your experience was exactly the same as the experience many people have on Twitter. However, instead of recognizing that moderation is always reactive (you moderate after someone causes problems not before) and that you had a real person you could reach out to you complain about it in a post about how Mastodon doesn't scale? Many, many thousands of people have reported racist behavior on Twitter and had zero response from moderation. Frequently the people being attacked are the ones who get their accounts suspended because of false reports. This doesn't happen on Mastodon. The admin is a real person who cares about their community and just blocks the offenders, or if there are too many of them, blocks their entire servers.

The abuse you suffered will always exist because anonymity, and pseudo-anonymity beget abusive behavior. It's a human problem. In a decentralized system like Mastodon there's a real person who's volunteered to help protect the people in their community from stuff like that. In centralized systems, you're typically screwed.

We clearly won't come to an agreement across all of mastodon about what acceptable behavior is, and there is no central entity controlling it. It will always be a messy process. I guess this is something that Mastodon will have to learn living with, even though I can't imagine what that means.

Sure, but it's not something "Mastodon" will have to learn to live with. It's something humans will have to learn to live with, if they want to participate in online communities. It's no different in a centralized vs decentralized system either. The only thing that changes is how much power a community has to protect itself.

Unlike Twitter which was a public company with a certain level of responsibility and accountability, Mastodon is messy legally speaking as well. It's not above the law, even if it maybe wants to be, and instances will have to follow the laws of the countries they are embedded in. We already know how messy this is even for centralized services. But at least those enterprises were large enough to pay lawyers and figures this out in courts.

So. In a centralized system when their shirk their responsibility they can bury you with a horde of expensive and talented lawyers who will make you go broke or go away. In a distributed system there are individual humans responsible for the servers who can be held accountable and generally don't have the resources to bury you in legal fees. So, again, a strong argument for decentralized solutions. Also, a strong argument for communities that can actually police themselves as opposed to crossing your fingers and hoping BigCo will do it.

For large mastodon instances this might turn into a problem, and for small instances the legal risk of hosting the wrong thing might be completely overwhelming. I used to host a pastebin for a few years. It was Open Source and with that others also hosted it. I had to shut it down after it became (by a small percentage of users) used to host illegal content. In some cases links to very, very illegal content. Even today I still receive emails from users who beg me to take down pastes of that software from other domains, because people use it to host doxxed content. I really hard a hard few weeks when I first discovered what my software ended up being used for.

This is again, an argument against centralized systems. It's also an argument against large Fediverse instances too. The problem is (pseudo)anonymity, not decentralization or centralization. Companies like Twitter & Facebook & Google can't keep up with the volume of illegal content they host. A small community can. Also, don't allow anonymous people to use your resources if you're not willing to let them do bad things with it. Mastodon servers don't have to be wide open to the public. If you do open yours to the public you can put very strict rules on participation and you can easily limit the number of incoming people. You can require approval, or you can simply turn off self-signup when the number of folks goes beyond your ability to manage.

If you look at a mastodon server as a community (as designed) then you can easily keep an eye on the new folks, and their behavior, and kill their accounts if they violate your community's rules.

… then there is also the issue of what happens if someone popular joins the instance.

That's a symptom. Not a problem. Someone famous joining your instance unexpectedly is a symptom of you not managing your community or regulating sign-ups. If you want to have an open server that's fine, but you need to recognize that if you choose that (it's an active choice not a passive one) then you are opening yourself up to that option.

You can, of course, balance this by saying "hey, my server is open, but if you have millions of followers I will ask you to pay for some of the server costs, or I will boot you" Folks with millions of follows can, generally, afford the cheap monthly fee to get their own damn server anyway.

Imagine you're a rather small server and suddenly Eli Lilly and Company joins your instance. Today they have around 140K followers on Twitter and they are a publicly traded company. First of all with an account that large, every one of their posts will cause a lot of load on your infrastructure. Secondly though, they are a very interesting target to attack.

In order to do this i have to

Imagine I decided to allow corporate accounts on my server. Why, the fuck, would i do that?!
Imagine corporate accounts with over a hundred thousand followers would want to join a random server instead of just setting up their own (trivially easy to do, doesn't require IT skills).
Imagine I wouldn't ask them to foot the bill for the additional resources they were using

So yes, if I imagine a series of highly improbably decisions 2/3's of which would be me choosing to invite or accept harm, I could find myself in a "bad" situation.

But you don't even need to be that popular to be worried about what your instance is like. People put a lot of trust into Twitter accounts over the years. I had plenty of exchanges over private DMs with people which I really would not want to be public. Yet how do I know that my instance operator does not really like to secretly read my communication?

You know this can, and does, happen on the centralized systems too, right?

C.O.M.M.U.N.I.T.Y I don't know how many times I'm going to have to say this. A mastodon instance is a community. If you don't trust that community don't join it. Do i trust the thousands of Twitter employees to not to ever read DMs? No. Fuck no. Do I trust a community of people organized around principles I agree with? Probably.

Also, this is a fucking straw man argument. Don't say things in DMs that are sensitive and secret. At least Mastodon has the decency to warn you every time that your DMs aren't encrypted. Seriously though, why, the fuck, would an admin want to spend their time trolling around in the DMs? Maybe if you're on a server filled with trolls who get their jollies off of being mean to other people. If, on the other hand, you're a trans woman who's joined a server that's all about protecting and supporting trans people… odds are that they're not going to look at your private stuff.

On top of this. Mastodon has no interface to do this. You have to directly muck about with the database. A massive portion of Mastodon instances are hosted on servers run by masto.host, and you know that they don't offer? Direct database access. I couldn't read my members DMs if I wanted to. I guess I could ask for a copy of my backup and then point a database tool at whatever he sent me but… This is a bullshit argument.

Do I know if my instance operator could even keep the communication private in the light of hackers?

Again with the straw-man arguments. Twitter can't. Facebook can't. Google can't. The whole point of a hacker is to steal secret information, and there's no such thing as an unhackable system. So no, a Mastodon admin can't but neither can anyone else.

I'm sure over the years thousands of credit card numbers, token access credentials or passwords were exchanged in Twitter DMs. Imagine what a juicy target that would be on Mastodon servers.

This is, yet another, argument for decentralized servers. Hacking Twitter gets you millions of accounts. Hacking my Mastodon server gets you 8. Exchanging illegal information in DMs has nothing to do with if your server's been hacked or not.

Mastodon is getting some traction today, but Mastodon is around for a long time. And with that, may of the problems it had over the years are still unresolved.

Like Facebook, and Twitter, and Google, and everyone else.

He provides a bunch of examples that I've already addressed but…

Or that the most controversial and replied to issue is about optionally disabling replies to posts like on Twitter.

Twitter has refused to give abuse moderation tools to people without the verification for years. They also regularly refuse to give people who are targets of abuse verification. I'm intentionally ignoring the $8 verification WTF that is currently going on. So again, straw-man argument.

Don't use Mastodon if you don't like Mastodon's features. There are other Fediverse servers out there with different moderation options. If you don't like what features a Centralized system offers you you're screwed. If you don't like what features your Fediverse server offers, you can choose another one and still participate.

Or that there are popular forks of Mastodon with different goals than Mastodon who can't get their changes merged back.

Why do you think they should?! That implies that Mastodon developers should have to accept whatever random features some other group of people thinks are good? Should Twitter and Facebook do whatever random suggestions other people like? There are millions of folks in the US who believe we should remove the separation of Church and State. Should we do that?

No-one should be forced to make their software have whatever features other people want. This is especially true when the forks / competitors with the features you want exist and can be used. Jesus fuck-me. For example, Hometown is a great fork that has a wonderful feature of letting people message only other people on their server. Mastodon isn't interested in that feature. That's fine. If that feature is important to you join or start a Hometown server.

This is one of the most important reasons why decentralized is better. You can still participate even if you don't agree with the way one piece of software on the network works. You can choose the features you want. If you want to be on a server with better moderation features. You can. There's literally nothing stopping you. You want to be on a server with better human moderators? You can.

To be honest, code is simple in comparison, but actually making Mastodon scale technically too will require changes if it wants to absorb some of the larger users on Twitter.

Sure. Step one. Use Pleroma. It handles vastly more connections for any given quantity of server resources. It's built on the Erlang Virtual Machine which was developed to handle digital switching of phone calls. However… the better solution is don't make huge instances. You can't have a huge community. You can't protect a huge community. You can't please a huge community.

The problem isn't a technological one. It's the human decisions that are the problem. You talk about back-pressure, but if your system is being overloaded because Bob decided to make a bot that follows ten million accounts, or Mary has ten million followers then boot them or ask them to pay. This isn't hard, and there are already solutions available that'll let Mary have a relatively cheap instance that can handle her ten million followers.

In my mind a better alternative to these two extremes of Twitter and Mastodon would be to find a middle ground. A service like Twitter is much cheaper and easier to run if it does not have to deal with federation on a technical level. An Open Source implementation of Twitter that is significantly cheaper to run than a Mastodon host that can scale to larger user numbers should be possible.

An open source version of twitter that scales either has to be federated OR you're building a twitter competitor, and trying to encourage millions of people to join your new network. The latter is arguably brain damaged. The former already exists. Mastodon is an open source twitter clone. Pleroma is an open source Twitter clone that scales really effing well.

Ideally at least some of these communities would try to be run like non profit foundations, then maybe they have a chance of hanging around.

You mean, the way many mastodon instances are already run?

A “Not Twitter Foundation” that runs an installation of an Open Source implementation of a scalable micro blogging platform is very appealing to me

It makes no sense to simultaneously bitch about Mastodon not accepting features some community members want them to have and say you want some centralized platform that's guaranteed to not accept features some community members want. You're also complaining about not having the features you want.

These aren't compatible ideas. Either you want access to the features you want (distributed system which offers choice) OR you want to let someone else decide what's acceptable and "good" for you.

You either want to be part of a community that can protect itself, or you want to cross your fingers and hope that some central organization will accomplish a task that's literally impossible due to issues of scale.

And then let the market figure out if that foundation does a good job at running it, and if not someone else will replace it.

Do not fucking bring capitalism into this discussion. That's what got us in this mess in the first place.

The Fediverse scales just fine. It's not supposed to be a collection of massive servers. It's supposed to be a collection of communities. The tool isn't broken if its user misuses, or misunderstands it.

Here's a practical example. Check out Bonfire. It's a Fediverse server option and literally all of their features are about answering the question of "What does this community want"?/ I can't comment if the software is any good, but it's a great example of how Fediverse servers are all about supporting the needs of "a community" not the whole fucking internet.