MariaDB crash on SELECT COUNT FROM POSTS

We’re currently seeing a regular database crash which on investigation is always running the following statement at the moment it crashes.

Query (0x7f21226d9be0): SELECT p.*, (SELECT COUNT(Posts.Id) FROM Posts WHERE Posts.RootId = (CASE WHEN p.RootId = ‘’ THEN p.Id ELSE p.RootId END) AND Posts.DeleteAt = 0) AS ReplyCount FROM Posts p WHERE ((Id IN (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?) OR RootId IN (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)) AND ChannelId = ? AND DeleteAt = ?) ORDER BY CreateAt DESC

That looks like something internal, but not sure how to diagnose beyond this. Is it something anyone else has come across previously?

This is mattermost 5.37.1 running on CentOS 7 using Maria 10.5.12 - we’ve been rock solid with mattermost and the database before now, but this has now crashed maybe 6-8 times in the last 24 hours.

Other info - the Table size is fairly large, about 4.5G. The upgrade past I think 5.35 that made database changes did take a reasonable amount of time to complete, so not surprised if this turns out to be a size/timeout thing. I have increased the query timeout overnight by a few minutes, but unfortunately hasn’t made any difference.

We’ll be trying some general mysql maintenance overnight, but in case it’s something specific to mattermost I thought I’d ask if anyone else has seen anything.I have posted this to support (we have an E10 instance), but just in case anyone here can help as well.

Thanks,
Paul

1 Like

Hello! Given that this type of MySQL query is quite intensive on the hardware of the server when you’re dealing with data of this quantity, I feel it is reasonable to ask about the hardware specifications of the server. Would you be able to elaborate on the specifications of the server, specifically the available RAM, and how much RAM the MySQL engine is allocated to use at maximum? Thanks!

Server in question (a VM) has 12G RAM and innodb_buffer_pool_size for the database is 1G (which could be higher so worth tweaking that I think).

Server load and memory usage has never been close to any limits though, and we’ve had no previous issues suggesting resource was a factor.

For what it’s worth, we’ve resolved the issue end of last week by dumping, dropping and reimporting the mattermost database. We saw in the service logging some warnings about corrupt indexes which suggested the above course of action and that has cleared things up. Not certain if some indexing issue caused the trouble or if this came about because of the crashing, but as a fix it has worked. We’ll probably look to factor this into server upgrades in the future as an extra check against any possible issues.

Thanks,
Paul

Glad you got it taken care of! It seems to me that you may have had your database corrupt itself when initially indexing itself to cache, which would make sense. Glad you got it taken care of!