I feel like i made some unclear/easy to misunderstand statements lately so i'll try to clarify.
Recent downtimes / low performance
Some days ago our Matrix server became frequently unavailable and was overall slow,
this was because a mediaserver called Minio+Mastodon's media was moved the server Synapse (the Matrix server) was living on. Well, tbh, the previous performance wasn't that great too but it was "kinda" usable.
It however became more obvious than it was before that this little ssd-boosted VM was suffering a bad disk performance, Minio frequently thought it was a good idea to rescan the whole media bucket (almost 100G of little images and stuff) which then caused Synapse to perform bad - actually multible times even the filesystem corrupted, this was when there was a full outage.
Due high diskload because Minio is #1 affected by the corrupt, read-only filesystem, it performed a rescan right after the boot ... unfortunaly Synapse caused a ton of load due overall startup and doing the missed federation work on downtime and so those two where in a long long fight for io.
Most recent over 24h downtime - a new dedicated server for our Matrix server
Because i still believe in the shiny shiny future of Matrix and refuse to give up so quick (Mastodon has #1 prio of course as most ppl are using it and like 99% of donations are coming from it and keep the tchncs infrastructure alive, not gonna lie here), i decided to do a risky action - risky because i prefer having some donation money as reserve for the simple fact that for example renting a fresh server can easily cost about 100€ plus monthly costs in one time - which actually just happened to me as well cuz the old hoster for Mastodon and Illuna gave me a replacement machine due a failing disk with ... a failing disk so i switched hosters.. .
However i decided to do a little hunt for the best value server and ended up renting a machine at Kimsufi with fancy ''2009 hardware:
- Xeon W3520 @2.66GHz
- 16GB DDR3 ECC 1333 MHz RAM
- 2x2TB RAID1 7200rpm disks
That's an expenses boost of about 24€/month.
As i don't yet use a storage provider for media (and Synapse seems not to support Minio), i also had a freakin' ton of media to move over to the new machine while the database was migrated relatively fast (about 3-4h backup and 1-2h import) but the media took...something over 20h..lol right? I did not want to startup without this being done mainly for avoiding confusion...there is no clean way to announce such things to all users via Matrix itself and i had to realize that only a few even read this forum posts / the status page / the @everyone pings in the tchncs room - so yea.. i am unsure if it was a good decision but hey.. . 🙂
Current situation / the future of the tchncs Matrix server
Not gonna lie, the future heavily relies on the reliability of donations coming in - as of right now i have to stay with the old server too and even had to rent a more expensive machine for Mastodon and Minetest - so in summary i got an expenses boost of about 30€/month - this was/is not that big of a deal but now Liberapay is in trouble and it looks like beside some donating a summary of planned month donations, actually about 20€/month are already missing - i can only hope that this situation is going to balance itself quite soon.
If the donations stay balanced (maybe there are bank transfers pending, how could i know? 🙂), the server will stay as it is for a while - those 2TB space and 16G memory should serve us well a while. As far as i experienced in the past days, the new performance of this service is really neat!
Also unlike the VM or the old hoster i had Mastodon livin' on, Kimsufi confirmed me that they would swap a failing disk if needed and let me rebuild the Raid, which is really good. Even backups don't almost kill the machine anymore and only take about 51minutes now (as far as i can tell it was not even to feel on Matrix!! wooo!).
But if donations are getting too tight, i can only hope that the (proper) chatlog expiry feature will arrive soon to Synapse so i can effectively cut down the database size in order to migrate the service to the Mastodon machine for example (uuh fancy fancy SSD, jummy!) - well otherwise i am forced to freeze the service of course but i have trust in you partypeople, let's rock this! 🙂
- The Telematrix bridge should be way more reliable now - 1. because better performance and 2. because it gets restarted hourly now
- The Turn server for calls will be back soon
- The alias bot will work again soon