We're Hiring!

Mattermost, Inc.

Document content extractor thinks ODTs are ZIPs and won't extract

Just upgraded to 5.35.1, and I want to get searching document contents to work for documents uploaded before the upgrade (from 5.34.2). We have a lot of ODT documents created in LibreOffice Writer, but the content extractor thinks they’re ZIPs and won’t handle them. The server is running Ubuntu 18.04.

I installed the five suggested dependencies via:
sudo apt-get install poppler-utils wv unrtf tidy
sudo apt install golang-go
go get github.com/JalfResi/justext

I ran the content extraction routine via:
sudo -u mattermost /opt/mattermost/bin/mattermost extract-documents-content

Every ODT goes something like this:
extracting file StaffMtgMinutes-21-05-14.odt 20210514/teams/noteam/channels/hesn9qdu4786xqxjopehxkai4c/users/cej36b8upffupgfwgcj1pw87ay/ftb17dbfujrj7f6siynox6n64h/StaffMtgMinutes-21-05-14.odt
{"level":"warn","ts":1622139271.0618308,"caller":"docextractor/combine.go:35","msg":"unable to extract file content","error":"error unzipping data: zip: not a valid zip file"}

What can I do to fix this? Is there a bug somewhere?

(Newly-uploaded ODTs are also not having their contents extracted, judging from my inability to find things in their contents using the search tool.)