Running `mattermost extract-documents-content` after going to 5.35 gave errors

Hi all,

I updated from 5.33.2 to 5.35.1 and that went fine. Then I ran mattermost extract-documents-content as suggested and at the end it spewed the below. It looks like a crash and backtrace I guess?

Anyone else see this? What does it mean exactly?

{"level":"warn","ts":1621736812.116505,"caller":"docextractor/combine.go:35","msg":"unable to extract file content","error":"exec: \"pdftotext\": executable file not found in $PATH"}
{"level":"info","ts":1621736812.1348782,"caller":"app/server.go:911","msg":"Stopping Server..."}
{"level":"info","ts":1621736812.1354005,"caller":"app/web_hub.go:103","msg":"stopping websocket hub connections"}
{"level":"info","ts":1621736812.1388233,"caller":"app/server.go:997","msg":"Server stopped"}
panic: loading {11 0}: found {10 0}

goroutine 1 [running]:
github.com/ledongthuc/pdf.(*Reader).resolve(0xc00689db00, 0x0, 0x2822080, 0xc000e75a90, 0xc006a254d8, 0x0, 0x1000, 0xc001877680)
	github.com/ledongthuc/pdf@v0.0.0-20200323191019-23c5852adbd2/read.go:768 +0xdd6
github.com/ledongthuc/pdf.Value.Key(0xc00689db00, 0x0, 0x279e280, 0xc005ae12c0, 0x2b09a7b, 0x4, 0x0, 0x0, 0xc006a25440, 0x26758c0)
	github.com/ledongthuc/pdf@v0.0.0-20200323191019-23c5852adbd2/read.go:671 +0x9e
github.com/ledongthuc/pdf.(*Reader).NumPage(0xc00689db00, 0x1946c0f)
	github.com/ledongthuc/pdf@v0.0.0-20200323191019-23c5852adbd2/page.go:59 +0x69
github.com/ledongthuc/pdf.(*Reader).GetPlainText(0xc00689db00, 0xc005d12048, 0x13a7110, 0x0, 0xc00689db00)
	github.com/ledongthuc/pdf@v0.0.0-20200323191019-23c5852adbd2/page.go:64 +0x45
github.com/mattermost/mattermost-server/v5/services/docextractor.(*pdfExtractor).Extract(0x4393048, 0xc004685360, 0x15, 0x2f9bde0, 0xc005d12038, 0x0, 0x0, 0x0, 0x0)
	github.com/mattermost/mattermost-server/v5/services/docextractor/pdf.go:46 +0x35f
github.com/mattermost/mattermost-server/v5/services/docextractor.(*combineExtractor).Extract(0xc001eec680, 0xc004685360, 0x15, 0x2f9bde0, 0xc005d12038, 0xc006a61d80, 0x2, 0x4, 0x2f9bde0)
	github.com/mattermost/mattermost-server/v5/services/docextractor/combine.go:33 +0x142
github.com/mattermost/mattermost-server/v5/services/docextractor.ExtractWithExtraExtractors(0xc004685360, 0x15, 0x2f9bde0, 0xc005d12038, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc001877b60, ...)
	github.com/mattermost/mattermost-server/v5/services/docextractor/docextractor.go:43 +0x2d4
github.com/mattermost/mattermost-server/v5/services/docextractor.Extract(...)
	github.com/mattermost/mattermost-server/v5/services/docextractor/docextractor.go:19
github.com/mattermost/mattermost-server/v5/app.(*App).ExtractContentFromFileInfo(0xc0000b0a20, 0xc004698500, 0x0, 0x0)
	github.com/mattermost/mattermost-server/v5/app/file.go:1394 +0x1f0
github.com/mattermost/mattermost-server/v5/cmd/mattermost/commands.extractContentCmdF(0x43182a0, 0x4393048, 0x0, 0x0, 0x0, 0x0)
	github.com/mattermost/mattermost-server/v5/cmd/mattermost/commands/extract_content.go:82 +0x3c5
github.com/spf13/cobra.(*Command).execute(0x43182a0, 0x4393048, 0x0, 0x0, 0x43182a0, 0x4393048)
	github.com/spf13/cobra@v1.1.3/command.go:852 +0x47c
github.com/spf13/cobra.(*Command).ExecuteC(0x431e6a0, 0x0, 0xffffffff, 0xc000102058)
	github.com/spf13/cobra@v1.1.3/command.go:960 +0x375
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra@v1.1.3/command.go:897
github.com/mattermost/mattermost-server/v5/cmd/mattermost/commands.Run(...)
	github.com/mattermost/mattermost-server/v5/cmd/mattermost/commands/root.go:14
main.main()
	github.com/mattermost/mattermost-server/v5/cmd/mattermost/main.go:31 +0x86

We have a ticket open and are working on a fix: [MM-35990] Mattermost crashes when content extractor dependencies not present - Mattermost.

1 Like

5.35.2 dot release is now available with this fix,

Yup, seems fixed, though now I hit a different crash:

fatal error: runtime: out of memory

runtime stack:
runtime.throw(0x2b55bb5, 0x16)
	runtime/panic.go:1116 +0x72
runtime.sysMap(0xc024000000, 0x8000000, 0x4397378)
	runtime/mem_linux.go:169 +0xc6
runtime.(*mheap).sysAlloc(0x4379a20, 0x7c00000, 0x42df97, 0x4379a28)
	runtime/malloc.go:727 +0x1e5
runtime.(*mheap).grow(0x4379a20, 0x3c63, 0x0)
	runtime/mheap.go:1344 +0x85
runtime.(*mheap).allocSpan(0x4379a20, 0x3c63, 0x100, 0x4397388, 0xfffffffffffffade)
	runtime/mheap.go:1160 +0x6b6
runtime.(*mheap).alloc.func1()
	runtime/mheap.go:907 +0x65
runtime.(*mheap).alloc(0x4379a20, 0x3c63, 0x7f09ef460001, 0x42ba4a)
	runtime/mheap.go:901 +0x85
runtime.largeAlloc(0x78c501a, 0x460100, 0xc018ac0000)
	runtime/malloc.go:1177 +0x92
runtime.mallocgc.func1()
	runtime/malloc.go:1071 +0x46
runtime.systemstack(0x0)
	runtime/asm_amd64.s:370 +0x66
runtime.mstart()
	runtime/proc.go:1116

This was when it was processing a 150 MB file. My server has only 1.25 GiB of memory (it’s a VM, I’ll try increasing it), but it has plenty of swap space, so I’m a bit surprised it didn’t succeed anyway.

@jespino Can you help take a look?

Hi @seanm,

This looks like is related to linux kernel memory management and how it works. Maybe this link has explanation: linux - Out of memory, but swap available - Unix & Linux Stack Exchange

Anyway, there is a bad combination of huge extractable files and low memory availability, probably we can add a configuration setting to limit the extracted content to certain size. What do you think @eric?

1 Like