Hi,
I have mounted sharpeoint site to a network drive (/mnt/sp) in centos.
Then I am indexing the mounted files using fscrawler.
Here is my settings file:
---
name: "index_45.79.189.33"
fs:
url: "/mnt/sp/fsSharepointFiles"
update_rate: "15m"
excludes:
- "*/~*"
json_support: false
filename_as_id: false
add_filesize: true
remove_deleted: true
add_as_inner_object: false
store_source: false
index_content: true
attributes_support: false
raw_metadata: false
xml_support: false
index_folders: true
lang_detect: false
continue_on_error: false
ocr:
language: "eng"
enabled: true
pdf_strategy: "ocr_and_text"
follow_symlinks: false
elasticsearch:
nodes:
- url: "http://50.116.48.89:8881"
username: ls
password: lspass
bulk_size: 100
flush_interval: "5s"
byte_size: "10mb"
when I create my index directory and index for the first time everything is working fine.
But when reindex using --restart below errors are coming.
```
[ls@li1288-33 fscrawler-es7-2.7-SNAPSHOT]$ bin/fscrawler --config_dir test_dir_45.79.189.33 index_45.79.189.33 --loop 1 --restart
03:03:47,737 INFO [f.p.e.c.f.c.BootstrapChecks] Memory [Free/Total=Percent]: HEAP [61.2mb/843mb=7.27%], RAM [1.8gb/3.7gb=49.26%], Swap [511.9mb/511.9mb=100.0%].
03:03:49,000 INFO [f.p.e.c.f.c.v.ElasticsearchClientV7] Elasticsearch Client for version 7.x connected to a node running version 7.5.1
03:03:49,088 INFO [f.p.e.c.f.FsCrawlerImpl] Starting FS crawler
03:03:49,448 INFO [f.p.e.c.f.FsParserAbstract] FS crawler started for [index_45.79.189.33] for [/mnt/sp/fsSharepointFiles] every [15m]
03:04:39,529 WARN [f.p.e.c.f.FsParserAbstract] Error while crawling /mnt/sp/fsSharepointFiles: /mnt/sp/fsSharepointFiles/fsSharepointfile1.txt (Resource temporarily unavailable)
03:04:39,529 INFO [f.p.e.c.f.FsParserAbstract] FS crawler is stopping after 1 run
03:04:54,121 WARN [f.p.e.c.f.c.v.ElasticsearchClientV7] Got a hard failure when executing the bulk request
java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-0 [ACTIVE]
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:387) [httpcore-nio-4.4.13.jar:4.4.13]
at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:92) [httpasyncclient-4.1.4.jar:4.1.4]
at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:39) [httpasyncclient-4.1.4.jar:4.1.4]
at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175) [httpcore-nio-4.4.13.jar:4.4.13]
at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:261) [httpcore-nio-4.4.13.jar:4.4.13]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:502) [httpcore-nio-4.4.13.jar:4.4.13]
at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:211) [httpcore-nio-4.4.13.jar:4.4.13]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280) [httpcore-nio-4.4.13.jar:4.4.13]
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) [httpcore-nio-4.4.13.jar:4.4.13]
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591) [httpcore-nio-4.4.13.jar:4.4.13]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]
03:04:54,134 INFO [f.p.e.c.f.FsCrawlerImpl] FS crawler [index_45.79.189.33] stopped
03:04:54,138 INFO [f.p.e.c.f.FsCrawlerImpl] FS crawler [index_45.79.189.33] stopped
[ls@li1288-33 fscrawler-es7-2.7-SNAPSHOT]$ ^C
[ls@li1288-33 fscrawler-es7-2.7-SNAPSHOT]$ bin/fscrawler --config_dir test_dir_45.79.189.33 index_45.79.189.33 --loop 1 --restart
03:11:35,231 INFO [f.p.e.c.f.c.BootstrapChecks] Memory [Free/Total=Percent]: HEAP [61.3mb/843mb=7.28%], RAM [1.8gb/3.7gb=49.24%], Swap [511.9mb/511.9mb=100.0%].
03:11:37,200 WARN [f.p.e.c.f.c.v.ElasticsearchClientV7] failed to create elasticsearch client, disabling crawler...
03:11:37,200 FATAL [f.p.e.c.f.c.FsCrawlerCli] We can not start Elasticsearch Client. Exiting.
java.net.ConnectException: Timeout connecting to [/50.116.48.89:8881]
at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:823) ~[elasticsearch-rest-client-7.6.2.jar:7.6.2]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248) ~[elasticsearch-rest-client-7.6.2.jar:7.6.2]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235) ~[elasticsearch-rest-client-7.6.2.jar:7.6.2]
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1514) ~[elasticsearch-rest-high-level-client-7.6.2.jar:7.6.2]
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1499) ~[elasticsearch-rest-high-level-client-7.6.2.jar:7.6.2]
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1466) ~[elasticsearch-rest-high-level-client-7.6.2.jar:7.6.2]
at org.elasticsearch.client.RestHighLevelClient.info(RestHighLevelClient.java:730) ~[elasticsearch-rest-high-level-client-7.6.2.jar:7.6.2]
at fr.pilato.elasticsearch.crawler.fs.client.v7.ElasticsearchClientV7.getVersion(ElasticsearchClientV7.java:169) ~[fscrawler-elasticsearch-client-v7-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.checkVersion(ElasticsearchClient.java:181) ~[fscrawler-elasticsearch-client-base-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.client.v7.ElasticsearchClientV7.start(ElasticsearchClientV7.java:142) ~[fscrawler-elasticsearch-client-v7-2.7-SNAPSHOT.jar:?]
at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:257) [fscrawler-cli-2.7-SNAPSHOT.jar:?]
Caused by: java.net.ConnectException: Timeout connecting to [/50.116.48.89:8881]
at org.apache.http.nio.pool.RouteSpecificPool.timeout(RouteSpecificPool.java:169) ~[httpcore-nio-4.4.13.jar:4.4.13]
at org.apache.http.nio.pool.AbstractNIOConnPool.requestTimeout(AbstractNIOConnPool.java:632) ~[httpcore-nio-4.4.13.jar:4.4.13]
at org.apache.http.nio.pool.AbstractNIOConnPool$InternalSessionRequestCallback.timeout(AbstractNIOConnPool.java:898) ~[httpcore-nio-4.4.13.jar:4.4.13]
at org.apache.http.impl.nio.reactor.SessionRequestImpl.timeout(SessionRequestImpl.java:198) ~[httpcore-nio-4.4.13.jar:4.4.13]
at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processTimeouts(DefaultConnectingIOReactor.java:213) ~[httpcore-nio-4.4.13.jar:4.4.13]
at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:158) ~[httpcore-nio-4.4.13.jar:4.4.13]
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:351) ~[httpcore-nio-4.4.13.jar:4.4.13]
at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221) ~[httpasyncclient-4.1.4.jar:4.1.4]
at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) ~[httpasyncclient-4.1.4.jar:4.1.4]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_242]
03:11:37,215 INFO [f.p.e.c.f.FsCrawlerImpl] FS crawler [index_45.79.189.33] stopped
03:11:37,216 INFO [f.p.e.c.f.FsCrawlerImpl] FS crawler [index_45.79.189.33] stopped
```
Again if I delete the index directory and start the indexing, no errors are coming.
[ls@li1288-33 fscrawler-es7-2.7-SNAPSHOT]$ rm -rf test_dir_45.79.189.33/
[ls@li1288-33 fscrawler-es7-2.7-SNAPSHOT]$ ls
bin lib LICENSE NOTICE README.md test_dir_sp_linux
[ls@li1288-33 fscrawler-es7-2.7-SNAPSHOT]$ bin/fscrawler --config_dir test_dir_45.79.189.33 index_45.79.189.33 --loop 1
03:25:16,631 INFO [f.p.e.c.f.c.BootstrapChecks] Memory [Free/Total=Percent]: HEAP [61.3mb/843mb=7.28%], RAM [1.8gb/3.7gb=49.21%], Swap [511.9mb/511.9mb=100.0%].
03:25:16,650 WARN [f.p.e.c.f.c.FsCrawlerCli] job [index_45.79.189.33] does not exist
03:25:16,651 INFO [f.p.e.c.f.c.FsCrawlerCli] Do you want to create it (Y/N)?
y
03:25:19,614 INFO [f.p.e.c.f.c.FsCrawlerCli] Settings have been created in [test_dir_45.79.189.33/index_45.79.189.33/_settings.yaml]. Please review and edit before relaunch
[ls@li1288-33 fscrawler-es7-2.7-SNAPSHOT]$ vim test_dir_45.79.189.33/index_45.79.189.33/_settings.yaml
[ls@li1288-33 fscrawler-es7-2.7-SNAPSHOT]$ bin/fscrawler --config_dir test_dir_45.79.189.33 index_45.79.189.33 --loop 1 --restart
03:28:57,555 INFO [f.p.e.c.f.c.BootstrapChecks] Memory [Free/Total=Percent]: HEAP [61.2mb/843mb=7.27%], RAM [1.8gb/3.7gb=49.25%], Swap [511.9mb/511.9mb=100.0%].
03:28:58,727 INFO [f.p.e.c.f.c.v.ElasticsearchClientV7] Elasticsearch Client for version 7.x connected to a node running version 7.5.1
03:28:58,841 INFO [f.p.e.c.f.FsCrawlerImpl] Starting FS crawler
03:28:59,209 INFO [f.p.e.c.f.FsParserAbstract] FS crawler started for [index_45.79.189.33] for [/mnt/sp/fsSharepointFiles] every [15m]
03:28:59,791 WARN [o.a.t.p.PDFParser] J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.
03:29:00,745 INFO [f.p.e.c.f.FsParserAbstract] FS crawler is stopping after 1 run
03:29:00,914 INFO [f.p.e.c.f.FsCrawlerImpl] FS crawler [index_45.79.189.33] stopped
Could you please tell me why re-indexing is not working properly?
-Lisa