Mattermost, Inc.

Is the "too many open files" error common?

I’ve been running v2.1.0 for a few weeks now, and this morning I had to bounce my executable due to a “too many open files” error every second (which manifested as my web interface refusing to accept any text input; the text box was yellow, I couldn’t send or get any new messages). I don’t know how long it was going on, but here’s a snippet of the error logs to the console:

2016/04/28 18:29:35 http: Accept error: accept tcp [::]:8065: accept4: too many open files; retrying in 1s
2016/04/28 18:29:36 http: Accept error: accept tcp [::]:8065: accept4: too many open files; retrying in 1s
2016/04/28 18:29:37 http: Accept error: accept tcp [::]:8065: accept4: too many open files; retrying in 5ms
2016/04/28 18:29:37 http: Accept error: accept tcp [::]:8065: accept4: too many open files; retrying in 10ms
2016/04/28 18:29:37 http: Accept error: accept tcp [::]:8065: accept4: too many open files; retrying in 20ms
2016/04/28 18:29:37 http: Accept error: accept tcp [::]:8065: accept4: too many open files; retrying in 40ms
2016/04/28 18:29:37 http: Accept error: accept tcp [::]:8065: accept4: too many open files; retrying in 80ms
2016/04/28 18:29:37 http: Accept error: accept tcp [::]:8065: accept4: too many open files; retrying in 160ms
2016/04/28 18:29:37 http: Accept error: accept tcp [::]:8065: accept4: too many open files; retrying in 320ms
2016/04/28 18:29:37 http: Accept error: accept tcp [::]:8065: accept4: too many open files; retrying in 640ms
2016/04/28 18:29:38 http: Accept error: accept tcp [::]:8065: accept4: too many open files; retrying in 1s
2016/04/28 18:29:39 http: Accept error: accept tcp [::]:8065: accept4: too many open files; retrying in 1s

This obviously isn’t a MM-centric message. A quick search turns up this SO post.

I’m running my client as a non-root user with a limit of 1024 handles (ulimit -n == 1024). I certainly can increase this, but according to that SO post, it shouldn’t be necessary:

This can be changed in /etc/security/limits.conf - it’s the nofile param.

However, if you’re closing your sockets correctly, you shouldn’t receive this unless you’re opening a lot of simulataneous connections. It sounds like something is preventing your sockets from being closed appropriately. I would verify that they are being handled properly.

I didn’t find an indication that this is a common issue, but is it?

Hi @adamshirey,

Hmmm, we haven’t seen this before on any of our production instances. How many concurrent users do you expect to have during peak times? How long had your server been running non-stop until you restarted the Mattermost server?

We only have 15 registered users, and maybe half of them are there at a time. I setup the server maybe a month or so ago and have now restarted it twice – so maybe ~2wks running until I had to restart it. It’s running within an Azure VM that I spun up the same day that I stated running Mattermost.

That certainly doesn’t sound like enough users to cause a too many open files issue. I’ve created a ticket here to investigate since we need to be sure that WebSocket connections are being closed properly.

Thanks for reporting this. If you encounter it again can you post here again?

Will do, thank you! I have set my log to console = ERROR and log to file = DEBUG. Anything in particular I should gather or test next time this happens before I restart the service?

Nope, just a copy of the logs when you noticed it starting would be very helpful. Thanks (and sorry for the late response)!