Weekly Server Restarts

Be sure to read and follow the guidelines for our forums.

Sep 19, 2024 10:34 pm
I've been made aware of how often GP is starting to slow down/throw errors on a regular basis. We're getting close to the database update (see this thread if you want to help us test so we can get it out ASAP), which should help speed the site up a lot, but one thing I've noticed is when I restart the server, the site works smoothly for a while. So with that, I'm going to be doing a weekly server reboot at 9pm ET, Tuesday evenings. The server should only be down a maximum of a few minutes, but be cautious with anything you're doing on the site, specially any built up work, before submitting it, as it will be lost to the ether. I will be posting on the Discord when the reboot is starting and when it's over for anyone who cares about the specific timing.
Sep 19, 2024 11:06 pm
Hi! There seems to be a daily server restart/gateway issue (HTTP 504) precisely at 00:00 GMT (8pm Eastern Time).

I thought that it was on purpose, an scheduled maintenance window for your application server (even though Nginx or something like that still answers). After a couple of minutes, its back.

Is it not a scheduled restart? Does it happens for someone else or is it just me? If that's the case, would it work to synchronize all those maintenance windows?
Last edited September 19, 2024 11:08 pm
Sep 20, 2024 2:04 am
htech says:
Is it not a scheduled restart? Does it happens for someone else or is it just me? If that's the case, would it work to synchronize all those maintenance windows?
That's the database backup. It actually happens at 00:00 GMT and 12:00 GMT. I can always shift it to different times, but there's no perfect time, so I left it there. Because we only have one database server, and it's running MySQL with MyISAM tables, the database basically locks up during the backup. If the new database update works out well, I'd love to set up a primary/replica configuration, so the backup can happen on the replica, leaving the primary to continue working during those times. The update also has the tables shifting to InnoDB, which has row level locking, so I'm hoping the backups interfere less.
Sep 20, 2024 7:51 am
It might be interesting to check if merely restarting the containers does the job, and then maybe finding if any specific container is to blame?

Restarting the containers should be a lot quicker than restarting the machine (and the containers).

This 'should not be necessary'™, though, so it would be interesting to instrument the system to see if it is actually slowing down and then speeding up again after a reboot (maybe it is placebo, or maybe a whole lot of people take a break for a while when a reboot happens and that makes things feel faster:). Unless the speedup continues for many days after a reboot a weekly disruption may not be paid for by the gains, but we should measure and see how it goes.
Sep 20, 2024 8:26 am
Would it be possible to restart the containers daily, while the database backup is in progress, to experiment with it? That won't be any extra disruption (the site is already down)
Last edited September 20, 2024 9:58 am
Sep 23, 2024 12:07 pm
htech says:
Would it be possible to restart the containers daily, while the database backup is in progress, to experiment with it? That won't be any extra disruption (the site is already down)
Can't do it while the database is being backed up, but we could chain it. I'll look at extending the script to do that. I'm not 100% sure how, but I'll figure that part out.
Sep 24, 2024 6:28 am
Keleth says:
htech says:
Would it be possible to restart the containers daily, while the database backup is in progress, to experiment with it? That won't be any extra disruption (the site is already down)
Can't do it while the database is being backed up, but we could chain it. I'll look at extending the script to do that. I'm not 100% sure how, but I'll figure that part out.
The database backups are done automatically? With no human involved?

I would be worried about doing container restarts that way, I have seen too many times when some weird quirk (usually timing) causes a container to fail to come back. A script could test that all the containers are up and running afterwards, and could probably just do another restart if anything does not look right, but, at least at the start, a human should be around to deal with any issues.

You do not have permission to post in this thread.