Additionally, is the behavior constantly reproducible whenever you shut down the server using systemd? Any recent changes performed on the system prior to the observation?
Thank you for the clarification. Since you mentioned that the issue does not occur on the test server, that will be a good point of comparison.
With reference to that, can you compare the /lib/systemd/system/mattermost.service between the servers?
Is the test server running on the same environment too (CentOS 7)?
Any chance of having the current /etc/systemd/system/mattermost.service removed and setup a fresh one again before reloading the systemd services again?
If the two servers are running similar version of Mattermost, I would put aside the possibility of upgrade contributing to the issue for now. I can’t find specific information through the error show on journalctl that I can make sense of so far. So, I am trying to trim down what the possibilities are.
I’m not sure what you mean about removing and replacing mattermost.service. What would I replace it with? I’m going to need to set it up the same surely…?
Here is the content (nothing special). Note that I added TimeoutStopSecafter this issue started occurring because I needed to avoid the 90-second downtime while restarting.
Thanks for the clarification. Since the server is identical (including the mattermost.service file too), I recommend you to run the following command to reload systemd manager configuration:
I already did the daemon-reload a few times while adding the stop timeout. It didn’t make any difference.
I guess the most obvious difference between the main and test servers is that the main server has hundreds of client sessions connecting, including some bots, while the test server has typically only one client session (i.e. me testing stuff).
Is there not some kind of thread / activity dump I can grab while in the hung state? Would that help to see what’s happening?
You can increase the log level to print debug messages using these config settings, and add in the SQL trace to see if there’s a query that’s hanging the shutdown sequence.