Fscrawler

Hi I am running elasticsearch cluster in docker. Setup fscrawler, but when I am running it is giving error We can not start Elasticsearch Client. Exiting.

Here are complete error

Recreating fscrawler ... done
Attaching to fscrawler
fscrawler    | 16:18:13,512 INFO  [f.p.e.c.f.c.BootstrapChecks] Memory [Free/Total=Percent]: HEAP [191.5mb/3gb=6.14%], RAM [5.7gb/13.7gb=42.12%], Swap [1.9gb/1.9gb=100.0%].
fscrawler    | 16:18:14,196 WARN  [f.p.e.c.f.c.v.ElasticsearchClientV7] failed to create elasticsearch client, disabling crawler...
fscrawler    | 16:18:14,196 FATAL [f.p.e.c.f.c.FsCrawlerCli] We can not start Elasticsearch Client. Exiting.
fscrawler    | java.net.ConnectException: Connection refused
fscrawler    |  at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:823) ~[elasticsearch-rest-client-7.5.2.jar:7.5.2]
fscrawler    |  at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248) ~[elasticsearch-rest-client-7.5.2.jar:7.5.2]
fscrawler    |  at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235) ~[elasticsearch-rest-client-7.5.2.jar:7.5.2]
fscrawler    |  at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1514) ~[elasticsearch-rest-high-level-client-7.5.2.jar:7.5.2]
fscrawler    |  at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1499) ~[elasticsearch-rest-high-level-client-7.5.2.jar:7.5.2]
fscrawler    |  at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1466) ~[elasticsearch-rest-high-level-client-7.5.2.jar:7.5.2]
fscrawler    |  at org.elasticsearch.client.RestHighLevelClient.info(RestHighLevelClient.java:730) ~[elasticsearch-rest-high-level-client-7.5.2.jar:7.5.2]
fscrawler    |  at fr.pilato.elasticsearch.crawler.fs.client.v7.ElasticsearchClientV7.getVersion(ElasticsearchClientV7.java:169) ~[fscrawler-elasticsearch-client-v7-2.7-SNAPSHOT.jar:?]
fscrawler    |  at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.checkVersion(ElasticsearchClient.java:181) ~[fscrawler-elasticsearch-client-base-2.7-SNAPSHOT.jar:?]
fscrawler    |  at fr.pilato.elasticsearch.crawler.fs.client.v7.ElasticsearchClientV7.start(ElasticsearchClientV7.java:142) ~[fscrawler-elasticsearch-client-v7-2.7-SNAPSHOT.jar:?]
fscrawler    |  at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:257) [fscrawler-cli-2.7-SNAPSHOT.jar:?]
fscrawler    | Caused by: java.net.ConnectException: Connection refused
fscrawler    |  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_222]
fscrawler    |  at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:1.8.0_222]
fscrawler    |  at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:174) ~[httpcore-nio-4.4.12.jar:4.4.12]
fscrawler    |  at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:148) ~[httpcore-nio-4.4.12.jar:4.4.12]
fscrawler    |  at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:351) ~[httpcore-nio-4.4.12.jar:4.4.12]
fscrawler    |  at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221) ~[httpasyncclient-4.1.4.jar:4.1.4]
fscrawler    |  at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) ~[httpasyncclient-4.1.4.jar:4.1.4]
fscrawler    |  at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_222]
fscrawler    | 16:18:14,201 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [myfirstjob] stopped
fscrawler    | 16:18:14,202 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [myfirstjob] stopped
fscrawler exited with code 0

I checked elasticsearch by curl -X GET "localhost:9200"
it is running.

{
  "name" : "es01",
  "cluster_name" : "es-docker-cluster",
  "cluster_uuid" : "2YcD7BIwQf2k110fhvrY8g",
  "version" : {
    "number" : "7.10.1",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "1c34507e66d7db1211f66f3513706fdf548736aa",
    "build_date" : "2020-12-05T01:00:33.671820Z",
    "build_snapshot" : false,
    "lucene_version" : "8.7.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

I guess you check this from the host where FSCrawler is also running, right?

What is your FSCrawler config for the job?
Could you start FSCrawler with the --debug option?

How do you launch it?

Hi,
It is showing

Recreating fscrawler ... done
Attaching to fscrawler
fscrawler    | 17:39:01,343 INFO  [f.p.e.c.f.c.BootstrapChecks] Memory [Free/Total=Percent]: HEAP [191.5mb/3gb=6.14%], RAM [5.7gb/13.7gb=42.02%], Swap [1.9gb/1.9gb=100.0%].
fscrawler    | 17:39:01,350 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings.json] already exists
fscrawler    | 17:39:01,350 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings_folder.json] already exists
fscrawler    | 17:39:01,350 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_settings.json] already exists
fscrawler    | 17:39:01,351 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_settings_folder.json] already exists
fscrawler    | 17:39:01,351 DEBUG [f.p.e.c.f.c.FsCrawlerCli] Starting job [myfirstjob]...
fscrawler    | 17:39:01,655 DEBUG [f.p.e.c.f.c.ElasticsearchClientUtil] Trying to find a client version 7
fscrawler    | 17:39:02,078 WARN  [f.p.e.c.f.c.v.ElasticsearchClientV7] failed to create elasticsearch client, disabling crawler...
fscrawler    | 17:39:02,078 FATAL [f.p.e.c.f.c.FsCrawlerCli] We can not start Elasticsearch Client. Exiting.
fscrawler    | java.net.ConnectException: Connection refused

Could you answer to all my questions please?

Hi,
Above logs was after --debug.
Yes you were right, issue was due to location of fscrawler config path.
Now it is setup correctly and working with sample txt file.

I want to crawl sharepoint files data from fscrawler(it is setup on docker) is it possible or any elasticsearch plugin for sharepoint file crawl.

If you can mount the sharepoint drive, then FSCrawler can probably index the content.
Otherwise, I'd recommend looking at workplace search which has a lot of connectors.

Hi,
Workplace search starts with open source licence or need a minimum elastic licence subscription.
Can it work with elastic basic licence.

It starts from basic license.