Fscrawler

Hi I am running elasticsearch cluster in docker. Setup fscrawler, but when I am running it is giving error We can not start Elasticsearch Client. Exiting.

Here are complete error

Recreating fscrawler ... done
Attaching to fscrawler
fscrawler    | 16:18:13,512 INFO  [f.p.e.c.f.c.BootstrapChecks] Memory [Free/Total=Percent]: HEAP [191.5mb/3gb=6.14%], RAM [5.7gb/13.7gb=42.12%], Swap [1.9gb/1.9gb=100.0%].
fscrawler    | 16:18:14,196 WARN  [f.p.e.c.f.c.v.ElasticsearchClientV7] failed to create elasticsearch client, disabling crawler...
fscrawler    | 16:18:14,196 FATAL [f.p.e.c.f.c.FsCrawlerCli] We can not start Elasticsearch Client. Exiting.
fscrawler    | java.net.ConnectException: Connection refused
fscrawler    |  at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:823) ~[elasticsearch-rest-client-7.5.2.jar:7.5.2]
fscrawler    |  at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248) ~[elasticsearch-rest-client-7.5.2.jar:7.5.2]
fscrawler    |  at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235) ~[elasticsearch-rest-client-7.5.2.jar:7.5.2]
fscrawler    |  at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1514) ~[elasticsearch-rest-high-level-client-7.5.2.jar:7.5.2]
fscrawler    |  at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1499) ~[elasticsearch-rest-high-level-client-7.5.2.jar:7.5.2]
fscrawler    |  at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1466) ~[elasticsearch-rest-high-level-client-7.5.2.jar:7.5.2]
fscrawler    |  at org.elasticsearch.client.RestHighLevelClient.info(RestHighLevelClient.java:730) ~[elasticsearch-rest-high-level-client-7.5.2.jar:7.5.2]
fscrawler    |  at fr.pilato.elasticsearch.crawler.fs.client.v7.ElasticsearchClientV7.getVersion(ElasticsearchClientV7.java:169) ~[fscrawler-elasticsearch-client-v7-2.7-SNAPSHOT.jar:?]
fscrawler    |  at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.checkVersion(ElasticsearchClient.java:181) ~[fscrawler-elasticsearch-client-base-2.7-SNAPSHOT.jar:?]
fscrawler    |  at fr.pilato.elasticsearch.crawler.fs.client.v7.ElasticsearchClientV7.start(ElasticsearchClientV7.java:142) ~[fscrawler-elasticsearch-client-v7-2.7-SNAPSHOT.jar:?]
fscrawler    |  at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:257) [fscrawler-cli-2.7-SNAPSHOT.jar:?]
fscrawler    | Caused by: java.net.ConnectException: Connection refused
fscrawler    |  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_222]
fscrawler    |  at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:1.8.0_222]
fscrawler    |  at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:174) ~[httpcore-nio-4.4.12.jar:4.4.12]
fscrawler    |  at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:148) ~[httpcore-nio-4.4.12.jar:4.4.12]
fscrawler    |  at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:351) ~[httpcore-nio-4.4.12.jar:4.4.12]
fscrawler    |  at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221) ~[httpasyncclient-4.1.4.jar:4.1.4]
fscrawler    |  at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) ~[httpasyncclient-4.1.4.jar:4.1.4]
fscrawler    |  at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_222]
fscrawler    | 16:18:14,201 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [myfirstjob] stopped
fscrawler    | 16:18:14,202 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [myfirstjob] stopped
fscrawler exited with code 0

I checked elasticsearch by curl -X GET "localhost:9200"
it is running.

{
  "name" : "es01",
  "cluster_name" : "es-docker-cluster",
  "cluster_uuid" : "2YcD7BIwQf2k110fhvrY8g",
  "version" : {
    "number" : "7.10.1",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "1c34507e66d7db1211f66f3513706fdf548736aa",
    "build_date" : "2020-12-05T01:00:33.671820Z",
    "build_snapshot" : false,
    "lucene_version" : "8.7.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

I guess you check this from the host where FSCrawler is also running, right?

What is your FSCrawler config for the job?
Could you start FSCrawler with the --debug option?

How do you launch it?

Hi,
It is showing

Recreating fscrawler ... done
Attaching to fscrawler
fscrawler    | 17:39:01,343 INFO  [f.p.e.c.f.c.BootstrapChecks] Memory [Free/Total=Percent]: HEAP [191.5mb/3gb=6.14%], RAM [5.7gb/13.7gb=42.02%], Swap [1.9gb/1.9gb=100.0%].
fscrawler    | 17:39:01,350 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings.json] already exists
fscrawler    | 17:39:01,350 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings_folder.json] already exists
fscrawler    | 17:39:01,350 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_settings.json] already exists
fscrawler    | 17:39:01,351 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_settings_folder.json] already exists
fscrawler    | 17:39:01,351 DEBUG [f.p.e.c.f.c.FsCrawlerCli] Starting job [myfirstjob]...
fscrawler    | 17:39:01,655 DEBUG [f.p.e.c.f.c.ElasticsearchClientUtil] Trying to find a client version 7
fscrawler    | 17:39:02,078 WARN  [f.p.e.c.f.c.v.ElasticsearchClientV7] failed to create elasticsearch client, disabling crawler...
fscrawler    | 17:39:02,078 FATAL [f.p.e.c.f.c.FsCrawlerCli] We can not start Elasticsearch Client. Exiting.
fscrawler    | java.net.ConnectException: Connection refused

Could you answer to all my questions please?

Hi,
Above logs was after --debug.
Yes you were right, issue was due to location of fscrawler config path.
Now it is setup correctly and working with sample txt file.

I want to crawl sharepoint files data from fscrawler(it is setup on docker) is it possible or any elasticsearch plugin for sharepoint file crawl.

If you can mount the sharepoint drive, then FSCrawler can probably index the content.
Otherwise, I'd recommend looking at workplace search which has a lot of connectors.

Hi,
Workplace search starts with open source licence or need a minimum elastic licence subscription.
Can it work with elastic basic licence.

It starts from basic license.

Hi,
issue on fscrawler side.
It was working correctly earlier, now I creating mysecondjob and it is start giving error.

Please don't post images of text as they are hard to read, may not display correctly for everyone, and are not searchable.

Instead, paste the text and format it with </> icon or pairs of triple backticks (```), and check the preview window to make sure it's properly formatted before posting it. This makes it more likely that your question will receive a useful answer.

It would be great if you could update your post to solve this.

Please provide the full logs. And run it with --trace option please.

Hi,
below log is after --trace.

Starting fscrawler ... done
Attaching to fscrawler
fscrawler           | 19:40:55,500 INFO  [f.p.e.c.f.c.BootstrapChecks] Memory [Free/Total=Percent]: HEAP [191.5mb/3gb=6.14%], RAM [7.9gb/13.7gb=57.87%], Swap [1.9gb/1.9gb=100.0%].
fscrawler           | 19:40:55,506 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings.json] already exists
fscrawler           | 19:40:55,506 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings_folder.json] already exists
fscrawler           | 19:40:55,506 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_settings.json] already exists
fscrawler           | 19:40:55,506 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_settings_folder.json] already exists
fscrawler           | 19:40:55,507 DEBUG [f.p.e.c.f.c.FsCrawlerCli] Starting job [mysecondjob]...
fscrawler           | 19:40:55,508 WARN  [f.p.e.c.f.c.FsCrawlerCli] job [mysecondjob] does not exist
fscrawler           | 19:40:55,508 INFO  [f.p.e.c.f.c.FsCrawlerCli] Do you want to create it (Y/N)?
fscrawler           | Exception in thread "main" java.util.NoSuchElementException
fscrawler           |   at java.util.Scanner.throwFor(Scanner.java:862)
fscrawler           |   at java.util.Scanner.next(Scanner.java:1371)
fscrawler           |   at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:225)
fscrawler exited with code 1

Hi David,
I got your suggestion for a proper format of posting logs.

And for providing the full logs, below are the full logs from fscrawler starting point.

[elasticuser@elasticsearch-centos docker-elk-master]$ docker-compose up fscrawler
Starting fscrawler ... done
Attaching to fscrawler
fscrawler           | 06:46:41,375 INFO  [f.p.e.c.f.c.BootstrapChecks] Memory [Free/Total=Percent]: HEAP [191.6mb/3gb=6.14%], RAM [8.1gb/13.7gb=59.8%], Swap [1.9gb/1.9gb=100.0%].
fscrawler           | 06:46:41,382 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings.json] already exists
fscrawler           | 06:46:41,382 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings_folder.json] already exists
fscrawler           | 06:46:41,382 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_settings.json] already exists
fscrawler           | 06:46:41,382 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_settings_folder.json] already exists
fscrawler           | 06:46:41,382 DEBUG [f.p.e.c.f.c.FsCrawlerCli] Starting job [mysecondjob]...
fscrawler           | 06:46:41,383 WARN  [f.p.e.c.f.c.FsCrawlerCli] job [mysecondjob] does not exist
fscrawler           | 06:46:41,384 INFO  [f.p.e.c.f.c.FsCrawlerCli] Do you want to create it (Y/N)?
fscrawler           | Exception in thread "main" java.util.NoSuchElementException
fscrawler           |   at java.util.Scanner.throwFor(Scanner.java:862)
fscrawler           |   at java.util.Scanner.next(Scanner.java:1371)
fscrawler           |   at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:225)
fscrawler exited with code 1
[elasticuser@elasticsearch-centos docker-elk-master]$

This is my docker-compose.yml file configuration for fscrawler.

version: '3.2'

services:

  fscrawler:
    image: toto1310/fscrawler
    container_name: fscrawler
    volumes:
      - /config:/root/.fscrawler
      - /data:/tmp/es
    ports:
      - 8080:8080
    networks:
      - elastic
    environment:
      - FS_URL=/tmp/es
      - ELASTICSEARCH_URL=http://es01:9200
      - ELASTICSEARCH_INDEX=mysecondjob
    command: fscrawler mysecondjob --trace

I see. You are using the FSCrawler Docker image which is probably an old one.

Any chance you could either rebuild a new version by yourself? There's a pending PR to have a proper docker image but it's not yet there.

Or don't use that docker image but a normal installation?

Hi,
my complete environment is on docker. Any FSCrawler Docker latest image which support elasticsearch version 7.10.

I think this image has been built last time a year ago: https://hub.docker.com/r/toto1310/fscrawler

Mine is also one year old: https://hub.docker.com/repository/docker/dadoonet/fscrawler

But as I said, there is a work pending here:

It does not work yet sadly.
If you are familiar with Docker, and want to help there, please do :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.