Hi, I am using Fscrawler to index a large set of documents kept in varous folders. I have created separate jobs for all the major folders and i run each job in Fscrawler. Some of the folders are quite large (>180 Gb) and contain some sub folders also for which creating individual jobs is very cumbersome process. In one such folder, I ran FScrawaler and after running for the entire day it gave an error which I am reproducing below. Can someone pl guide as to why the error is coming and how to resolve it.
2. I had earlier also run the crawler on the same folder and got an error, so Fscrawler tries to reindex every document/folder from the beginning every time it is started as there is no status.json file created if the crawler exits with an error.
Thanks
JS
// ERROR
16:37:08,809 DEBUG [f.p.e.c.f.c.FileAbstractor] Listing local files from E:\MISC FILES\IT Attendance System\Record of Attendance\IT attendance\2018\December 2018
16:37:08,809 DEBUG [f.p.e.c.f.c.FileAbstractor] 0 local files found
16:37:08,809 DEBUG [f.p.e.c.f.FsParser] Looking for removed files in [E:\MISC FILES\IT Attendance System\Record of Attendance\IT attendance\2018\December 2018]...
16:37:08,809 TRACE [f.p.e.c.f.FsParser] Querying elasticsearch for files in dir [path.root:4fcf95a09d4a7831e5910733dbaa]
16:37:12,982 TRACE [f.p.e.c.f.c.ElasticsearchClientManager] Sending a bulk request of [1] requests
16:37:13,581 TRACE [f.p.e.c.f.c.ElasticsearchClientManager] Sending a bulk request of [1] requests
16:37:34,027 WARN [f.p.e.c.f.c.ElasticsearchClientManager] Got a hard failure when executing the bulk request
java.net.SocketException: null
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:375) [httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:92) [httpasyncclient-4.1.2.jar:4.1.2]
at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:39) [httpasyncclient-4.1.2.jar:4.1.2]
at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175) [httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:263) [httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:492) [httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:213) [httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280) [httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) [httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588) [httpcore-nio-4.4.5.jar:4.4.5]
at java.lang.Thread.run(Unknown Source) [?:1.8.0_171]
16:37:38,927 WARN [f.p.e.c.f.FsParser] Error while crawling E:\MISC FILES: listener timeout after waiting for [30000] ms
16:37:38,999 WARN [f.p.e.c.f.FsParser] Full stacktrace
java.io.IOException: listener timeout after waiting for [30000] ms
at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:684) ~[elasticsearch-rest-client-6.3.2.jar:6.3.2]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235) ~[elasticsearch-rest-client-6.3.2.jar:6.3.2]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:198) ~[elasticsearch-rest-client-6.3.2.jar:6.3.2]
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:522) ~[elasticsearch-rest-high-level-client-6.3.2.jar:6.3.2]
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:508) ~[elasticsearch-rest-high-level-client-6.3.2.jar:6.3.2]
at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:404) ~[elasticsearch-rest-high-level-client-6.3.2.jar:6.3.2]
at fr.pilato.elasticsearch.crawler.fs.FsParser.getFileDirectory(FsParser.java:356) ~[fscrawler-core-2.5.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParser.addFilesRecursively(FsParser.java:307) ~[fscrawler-core-2.5.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParser.addFilesRecursively(FsParser.java:290) ~[fscrawler-core-2.5.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParser.addFilesRecursively(FsParser.java:290) ~[fscrawler-core-2.5.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParser.addFilesRecursively(FsParser.java:290) ~[fscrawler-core-2.5.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParser.addFilesRecursively(FsParser.java:290) ~[fscrawler-core-2.5.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParser.addFilesRecursively(FsParser.java:290) ~[fscrawler-core-2.5.jar:?]
at fr.pilato.elasticsearch.crawler.fs.FsParser.run(FsParser.java:167) [fscrawler-core-2.5.jar:?]
at java.lang.Thread.run(Unknown Source) [?:1.8.0_171]
16:37:39,139 INFO [f.p.e.c.f.FsParser] FS crawler is stopping after 1 run
16:37:39,428 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [e_drive_misc]
16:37:39,505 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is now stopped
16:37:39,505 DEBUG [f.p.e.c.f.c.ElasticsearchClientManager] Closing Elasticsearch client manager
16:37:44,226 WARN [f.p.e.c.f.c.ElasticsearchClientManager] Got a hard failure when executing the bulk request
java.net.SocketTimeoutException: null
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:375) [httpcore-nio-4.4.5.jar:4.4.5]
at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:92) [httpasyncclient-4.1.2.jar:4.1.2]
at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:39) [httpasyncclient-4.1.2.jar:4.1.2]
16:38:02,138 TRACE [f.p.e.c.f.c.ElasticsearchClientManager] Executed bulk request with [1] requests
16:38:02,328 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Closing REST client
16:38:02,629 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
16:38:02,629 INFO [f.p.e.c.f.FsCrawlerImpl] FS crawler [e_drive_misc] stopped
16:38:03,321 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [e_drive_misc]
16:38:03,422 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is now stopped
16:38:03,422 DEBUG [f.p.e.c.f.c.ElasticsearchClientManager] Closing Elasticsearch client manager
16:38:03,422 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Closing REST client
16:38:03,422 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
16:38:03,422 INFO [f.p.e.c.f.FsCrawlerImpl] FS crawler [e_drive_misc] stopped
//