Started around 7:15p CEST July 28.
ℹ️ Finalization note
I am happy to announce that the maintenance has completed.
We have saved around 200 GB of disk space and due to a little accident were forced to a bigger machine unlike planned. (the machine ran out of space because a ZFS snapshot needed more space than expected.)
The new server location is Helsinki 🇫🇮.
After the maintenance was done in the night of July 29, a Synapse update did not succeed because a semi-auto vacuum worker from the database was zombying around, causing a prevention of loading the new schema. This seemingly happened due to
pgcompacttable which somewhat freezed in the night and took quite a while to identify.
💡 If you cannot login
A few users still have the IP of our old machine cached in their client. Try logging out and in again, clearing the cache, restarting the app or device. This also happened to users whom were able to connect in between for a yet unclear reason. If affected, you may receive an error code 502.
ℹ️ Notice Jul 30 5:45pm
All thanks to @reivilibre on the Matrix Admins room, an autovacuum worker was identified for shamelessy blocking >the schema update and causing the other weird problems. Synapse has been updated and is now keeping up with the federation. Give it some time.
The Telegram is ready to return in the meantime, but let's give Synapse some space first. 🙂
ℹ️ Notice Jul 30 12:30pm
Sadly the Upgrade to the latest Synapse release did not work flawlessy. It applies the old database schema and then complains about it. Downgrading also did not work properly. I am currently looking for help in Matrix admin rooms.
➡️ background on the problem: https://github.com/matrix-org/synapse/issues/10502
ℹ️ Notice Jul 30 10:30pm
Sleepy Milan last night discovered that the Firewall on the new machine was prepared but not active and ... well ... http(s) ports were not yet allowed. I am so sorry. 🙈
Also: the Telegram bridge shall return shortly. It has lower priority and had problems spawning.
ℹ️ Notice Jul 29 11:30pm
Testrun is running, be patient, the database is still being worked on and Synapse needs to keep up with the federation.
ℹ️ Notice Jul 29 11pm
Your admin learned new things about the filesystem ZFS. This is great. And super embarassing. Your admin sabotaged himself by creating a snapshot of the database before proceeding. Your admin loves how one command reports
577K of used snapshot space and a different one reveals actually
171G on the old machine.
ℹ️Notice Jul 29 7pm
Sorry for the unexpected long maintenance. It was supposed to avoid renting a more expensive machine with bigger disk space, but while space was made free it also filled physical disk space to a degree that forced me to move the a bigger machine.
Also on this machine, the physical size is now bigger than before and i am trying to shrink it while preparing everything else to switch Matrix back on. On the bright side, except of growing dbs in physical size, tools like pg_repack and pgcompacttable seem to work pretty well and will be used in the future in the hope of delaying further hardware upgrades.
7:25pm - 8:50pm there are around 2.000.000 unreferenced stategroups to get rid of again
8:50pm - 0:50am the database shall get reindexed (there might not be enough space left to skip this)
- 🧠 okey, now we somehow lost precious disk space... investigating options
- 🧠 doing next step without starting synapse for now - also admin needs to sleep now
8am jul29 instead of freeing physical disk space like last time, the machine now somewhat ran out of space. renting bigger server now
8:45am jul29 setting up new machine
10:30am jul29 transferring database snapshot (dating before reindex, after deletion of unreffed, just in case)
11:00pm jul30 rerunning some maintenance tasks and tests to ensure a healthy database (sorry, admin had to be afk a few hours)
8:30pm jul29 preparing/migrating environment for Matrix services (presumably with upon step)
- ⏳️ finishing up
Measures already taken
- many rooms have been compressed over the past weeks.
- the Matrix HQ room has been purged (but not blocked), because the compressor kept being killed by OOM Killer.
- a little while ago, logging into the Telegram bridge and therefor mirroring huuuge rooms automatically is no longer allowed for the public.
Here is a graph of db growth for you