Tuning mattermost for client performance

My server (mattermost 5.9, E10 licence) has about 500 users and runs on a single 8-core server. The CPU and memory usage of the server have always been pretty low. However, in recent weeks, users have noticed some operations have become slower, with delays of maybe 2-10 seconds on the following:

  • loading a saved file / image
  • loading messages on channel switching
  • uploading an image (can now take 30s where previously it would be just a few seconds)
  • loading the list of users in the direct-channel switcher

Just to be specific about this last point, the operations are:

  • Shift-Cmd-K for direct message switcher
  • Start typing a username
  • It takes 3-10 seconds before the list of possible completions appears; late last year I remember this being instantaneous

I know that some of these issues began when I switched from local disk storage to S3 storage. This might actually explain all the slowdowns: for example I guess that the list of users in the direct channel switcher will require loading the users’ icons from storage. I am investigating the S3 configuration and network routing to see if we can improve that. I recently configured the S3 region and endpoint explicitly rather than the default (“Mattermost attempts to get the appropriate region from AWS”) which may help.

I suspect some of the issues relate to the speed of database queries. The database is Amazon RDS. The config items “Maximum Idle Connections” and “Maximum Open Connections” have been set to 10 ever since I set up the server. Is there any guidance on a suitable choice for these? I have tried increasing these to 50 as an experiment, and I notice that the number of active connections has immediately jumped to 23.

Any guidance on tuning this would be welcome!

1 Like

BTW the logs contain a lot of context deadline exceeded errors like this:

2019-04-25T10:08:10.337+1000    error   web/context.go:52       Unable to get the channels      {"path": "/api/v4/users/me/teams/sty844ch9tbipr4wiu7ezxgdmy/channels", "request_id": "gbxfwjsqkfrjdp7r6dbrm5kqnc", "ip_addr": "10.192.104.100", "user_id": "haf1pjznibycd8z6yun1acjrnc", "method": "GET", "err_where": "SqlChannelStore.GetChannels", "http_code": 500, "err_details": "teamId=sty844ch9tbipr4wiu7ezxgdmy, userId=haf1pjznibycd8z6yun1acjrnc, err=context deadline exceeded"}

I am not really sure what this means. It doesn’t seem to correlate with any specific slow operations. Maybe it’s relevant.

1 Like

Hi @gubbins,

I’ll ask our team if they have further recommendations.

Also, since you’re on E10, you can also take a look at what our customer support offers.

Hi @gubbins,

I haven’t heard feedback from community on this yet, would you like me to help open a support ticket and our support team could take a closer look at this?

1 Like

Thanks @amy.blais, I think it’s a good idea to try the E10 support route. I have gathered some more specific info now so I’ll pass it on to them.

Hi @gubbins, curious to see if you learned anything that you might be able to share with others.

Nothing very useful, @aagha786. I increased the database connection count to 50, and using the internal performance metrics I can see that most of the team we use fewer than that - while occasionally maxing out at 50. Our network team reviewed the routing to AWS services. Generally I’ve had less complaints since then.