Whats the best way to log every Mattermost post, so as to create a graph visualization of that data?

Hello!

I’m pretty new to Mattermost. My goal is to visualize every post that happens on my Mattermost server (team version) as a graph visualization (Nodes being users, edges being their posts, and the target nodes being channels… channels could also be visualized as the individual users within them)

I am not sure about the best way to go about this. It almost seems like I just want all the data that the Mattermost server will already be storing, just also saved in a graph format (node list + edge list) instead of however its currently saved. (You might be able to tell from my description, I am a novice developer :stuck_out_tongue: Please be patient)

My primary question is: what is the best way to go about getting this graph-formatted data (literally just need a csv of from-to relationships) Is there an existing means of running the entire server database through some sort of filter occasionally that spits out a CSV? Would that be reasonable? Would it be better to use the API and get every user, then get all their channels, then get all their posts? Is there a way to use webhooks to request that literally every post also get sent through the webhook? What about the new intercept stuff in v5.0? Could I intercept every post? My server already has a lot of activity on it, so I guess I’m seeking advice on two things. 1. Whats the recommended way to convert all of my existing data into the CSV format I described, and then 2. Moving forward, whats the best way to log each new post as it emerges so that it can appended onto the existing log?

Any recommendations or advice are welcom? One further complication is that I would also like to know which users were actually in a channel at the time a post was made… so chronology matters to me too, not just which channel a post was sent to and the latest list of who is in that channel.

Another question for my learning: Are DMs just a special-case of a channel? Will getting a user’s channels also return all of their DMs?

Hi @Bortseb! Thanks for reaching out.

Here are some docs that might help, would these be what you are looking for?

This is useful @amy.blais , but it also required E20, which I’m not currently using… So other suggestions are still requested, and I will consider switching to E20.

@Bortseb No worries, I’ll summarize your questions and ask our devs for more information!

@Bortseb I summarized your questions like this and below them are answers from our devs:

Questions:

  1. What is the recommended way to convert all of their existing server data into CSV format? Would that be reasonable?
  2. Moving forward, what is the best way to log each new post as it emerges so that it can appended onto the existing log?
  3. Would it be better to use the API and get every user, then get all their channels, then get all their posts?
  4. Is there a way to use webhooks to request that literally every post also gets sent through the webhook?
  5. What about the new intercept stuff in v5.0? Could they intercept every post?
  6. How to know which users were actually in a channel at the time a post was made (chronological)?
  7. Are DMs just a special-case of a channel? Will getting a user’s channels also return all of their DMs?

Answers:

  1. I’d just dump this data straight from the database as needed. See MySQL documentation and PostgreSQL documentation, but you should be able to query the Channels, Users and ChannelMembers table to build this information.
  2. Plugins.
  3. This would be feasible too.
  4. Not presently.
  5. Yes, plugins.
  6. We don’t appear to track this detail in the ChannelMembers table directly. Not sure if enterprise edition has more tracking here or not.
  7. Yes.
    In terms of streaming, it would also be possible to stream the database events directly to an application and process them completely outside Mattermost.
  1. Also: There’s a separate ChannelMemberHistory table that keeps track of each time a user joined and left a channel.

Hi @Bortseb - One of our engineers is interested to work with you to get https://github.com/mkraft/mattergraph working for your needs for the initial dump, and then getting it either updating on an interval or streaming.

  1. How much data do you have?
  2. Are you able to use Neo4J or do you have another one in mind?
  3. What’s your interest in coding to get this working?