FSCrawler question about password protected Word documents

I have a question about the fscrawler

Is it possible to crawl password protected Word documents? I came upon this issue on the Github repository, but It doesn't provide much insight outside of a single unit test. I don't seem to find any mention in the documentation either.

When trying to crawl a password protected Word document, I'm getting the following exception:

08:24:49,504 DEBUG [f.p.e.c.f.t.TikaDocParser] Failed to extract [100000] characters of text for [document-with-password.docx]
org.apache.tika.exception.EncryptedDocumentException: Unable to process: document is encrypted
        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:274) ~[tika-parser-microsoft-module-2.9.1.jar:2.9.1]
        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:183) ~[tika-parser-microsoft-module-2.9.1.jar:2.9.1]
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) ~[tika-core-2.9.1.jar:2.9.1]
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) ~[tika-core-2.9.1.jar:2.9.1]
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:203) ~[tika-core-2.9.1.jar:2.9.1]
        at fr.pilato.elasticsearch.crawler.fs.tika.TikaInstance.extractText(TikaInstance.java:197) ~[fscrawler-tika-2.10-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.tika.TikaDocParser.generate(TikaDocParser.java:98) ~[fscrawler-tika-2.10-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.rest.DocumentApi.uploadToDocumentService(DocumentApi.java:205) ~[fscrawler-rest-2.10-SNAPSHOT.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.rest.DocumentApi.addDocument(DocumentApi.java:94) ~[fscrawler-rest-2.10-SNAPSHOT.jar:?]
        at jdk.internal.reflect.GeneratedMethodAccessor54.invoke(Unknown Source) ~[?:?]
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
        at java.base/java.lang.reflect.Method.invoke(Method.java:566) ~[?:?]
        at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52) ~[jersey-server-3.1.5.jar:?]
        at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:146) [jersey-server-3.1.5.jar:?]
        at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:189) [jersey-server-3.1.5.jar:?]
        at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$TypeOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:219) [jersey-server-3.1.5.jar:?]
        at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:93) [jersey-server-3.1.5.jar:?]
        at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:478) [jersey-server-3.1.5.jar:?]
        at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:400) [jersey-server-3.1.5.jar:?]
        at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81) [jersey-server-3.1.5.jar:?]
        at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:261) [jersey-server-3.1.5.jar:?]
        at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248) [jersey-common-3.1.5.jar:?]
        at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244) [jersey-common-3.1.5.jar:?]
        at org.glassfish.jersey.internal.Errors.process(Errors.java:292) [jersey-common-3.1.5.jar:?]
        at org.glassfish.jersey.internal.Errors.process(Errors.java:274) [jersey-common-3.1.5.jar:?]
        at org.glassfish.jersey.internal.Errors.process(Errors.java:244) [jersey-common-3.1.5.jar:?]
        at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265) [jersey-common-3.1.5.jar:?]
        at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:240) [jersey-server-3.1.5.jar:?]
        at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:697) [jersey-server-3.1.5.jar:?]
        at org.glassfish.jersey.grizzly2.httpserver.GrizzlyHttpContainer.service(GrizzlyHttpContainer.java:367) [jersey-container-grizzly2-http-3.1.5.jar:?]
        at org.glassfish.grizzly.http.server.HttpHandler$1.run(HttpHandler.java:190) [grizzly-http-server-4.0.1.jar:4.0.1]
        at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.doWork(AbstractThreadPool.java:535) [grizzly-framework-4.0.1.jar:4.0.1]
        at org.glassfish.grizzly.threadpool.AbstractThreadPool$Worker.run(AbstractThreadPool.java:515) [grizzly-framework-4.0.1.jar:4.0.1]
        at java.base/java.lang.Thread.run(Thread.java:834) [?:?]
08:24:49,505 TRACE [f.p.e.c.f.t.TikaDocParser] End document generation

I'm using version 2.10-20240325.073416-333 of the fscrawler.

I don't think it can. The test was just added to prove that fscrawler does not break with protected documents but index the file metadata with no content extracted.

Thanks for your swift reply. Would it make sense to add this feature as a cli option when using the crawler as a REST service?

Do you mean that you know the password and want to extract data using that password?