Hey folks,
I just setup a new ES node with V8.10.4 for indexing our nextcloud documents.
After about 3 millions of index docs created, the Nextcloud indexer sends a (encrypted) PDF which lets ES die poorly.
java.lang.NoClassDefFoundError: org/bouncycastle/cms/CMSException
at java.lang.Class.getDeclaredConstructors0(Native Method) ~[?:?]
at java.lang.Class.privateGetDeclaredConstructors(Class.java:3549) ~[?:?]
at java.lang.Class.getConstructor0(Class.java:3754) ~[?:?]
at java.lang.Class.getDeclaredConstructor(Class.java:2930) ~[?:?]
at org.apache.pdfbox.pdmodel.encryption.SecurityHandlerFactory.newSecurityHandler(SecurityHandlerFactory.java:132) ~[?:?]
at org.apache.pdfbox.pdmodel.encryption.SecurityHandlerFactory.newSecurityHandlerForFilter(SecurityHandlerFactory.java:116) ~[?:?]
at org.apache.pdfbox.pdmodel.encryption.PDEncryption.<init>(PDEncryption.java:97) ~[?:?]
at org.apache.pdfbox.pdfparser.COSParser.prepareDecryption(COSParser.java:2974) ~[?:?]
at org.apache.pdfbox.pdfparser.COSParser.retrieveTrailer(COSParser.java:285) ~[?:?]
at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:173) ~[?:?]
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226) ~[?:?]
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1232) ~[?:?]
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1206) ~[?:?]
at org.apache.tika.parser.pdf.PDFParser.getPDDocument(PDFParser.java:317) ~[?:?]
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:172) ~[?:?]
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) ~[?:?]
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:195) ~[?:?]
at org.apache.tika.Tika.parseToString(Tika.java:525) ~[?:?]
at org.elasticsearch.ingest.attachment.TikaImpl.lambda$parse$0(TikaImpl.java:97) ~[?:?]
at java.security.AccessController.doPrivileged(AccessController.java:714) ~[?:?]
at org.elasticsearch.ingest.attachment.TikaImpl.parse(TikaImpl.java:96) ~[?:?]
at org.elasticsearch.ingest.attachment.AttachmentProcessor.execute(AttachmentProcessor.java:121) ~[?:?]
at org.elasticsearch.ingest.CompoundProcessor.innerExecute(CompoundProcessor.java:163) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.ingest.CompoundProcessor.execute(CompoundProcessor.java:139) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.ingest.Pipeline.execute(Pipeline.java:112) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.ingest.IngestDocument.executePipeline(IngestDocument.java:853) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.ingest.IngestService.executePipeline(IngestService.java:951) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.ingest.IngestService.executePipelines(IngestService.java:827) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.ingest.IngestService$1.doRun(IngestService.java:721) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:983) ~[elasticsearch-8.10.4.jar:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.10.4.jar:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
at java.lang.Thread.run(Thread.java:1583) ~[?:?]
I could isolate that suspicious document and let it index again on our current production node with ES 7.17.13 - and there is no moaning. ES still continues to do its work - as I would have expected from V8.10 also.
What could be the reasons?
Any help appreciated.
Frank