We're Hiring!

Mattermost, Inc.

Mattermost website returning 403 when headers contain the word 'python'

Hello
I was trying to fetch the /security-updates on the mattermost website via python requests and observed that mattermost is denying any request which has ‘python’ in its user agent. As the default user agent for python requests is python-requests/2.25.1, it is getting blocked too.
Is there a rationale behind this ?

Here’s what I tired

>>> requests.get("https://mattermost.com/security-updates/")
<Response [403]>
>>> requests.get("https://mattermost.com/security-updates/", headers={'user-agent': 'requests'})
<Response [200]>
>>> requests.get("https://mattermost.com/security-updates/", headers={'user-agent': 'python-requests'})
<Response [403]>
>>> requests.get("https://mattermost.com/security-updates/", headers={'user-agent': 'python-reque'})
\<Response [403]>
>>> requests.get("https://mattermost.com/security-updates/", headers={'user-agent': 'python-'})
<Response [403]>
>>> requests.get("https://mattermost.com/security-updates/", headers={'user-agent': 'python'})
<Response [403]>
>>> requests.get("https://mattermost.com/security-updates/", headers={'user-agent': 'pytho'})
<Response [200]>

As a cybersecurity consultant, I would assume that this is to prevent external automated web scanners from scanning/querying the website and trying to hack it. I would suggest explicitly specifying a user agent that is a web browser, or something along those lines - typically when I encounter an issue such as that, I just google what is my user agent and then use the returned information of my browser as the user agent in the requests I am trying to make, and it hasn’t failed me yet.

that this is to prevent external automated web scanners

I suppose that is the job of robots.txt. Filtering by user agent makes no sense exactly for the reason you mentioned, it can be easily bypassed. Further, curl is allowed but python is not, that doesn’t make sense.

The reason behind the blocking of general user agents such as that is to prevent web scraping, data harvesting, and attempt to make it more difficult for malicious threat actors to achieve the mayhem they intend to cause - even if it is a seemingly small step to take, it is a setback for the bad guys, none the less. I don’t really know what else to tell you, other than that python will send its user agent by default as urllib with some other stuff such as the word python, unless it is changed, while curl will not send any unless it is explicitly specified in the request you are making.