Fscrawler

vikram_singh · January 7, 2021, 4:28pm

Hi I am running elasticsearch cluster in docker. Setup fscrawler, but when I am running it is giving error We can not start Elasticsearch Client. Exiting.

Here are complete error

Recreating fscrawler ... done
Attaching to fscrawler
fscrawler    | 16:18:13,512 INFO  [f.p.e.c.f.c.BootstrapChecks] Memory [Free/Total=Percent]: HEAP [191.5mb/3gb=6.14%], RAM [5.7gb/13.7gb=42.12%], Swap [1.9gb/1.9gb=100.0%].
fscrawler    | 16:18:14,196 WARN  [f.p.e.c.f.c.v.ElasticsearchClientV7] failed to create elasticsearch client, disabling crawler...
fscrawler    | 16:18:14,196 FATAL [f.p.e.c.f.c.FsCrawlerCli] We can not start Elasticsearch Client. Exiting.
fscrawler    | java.net.ConnectException: Connection refused
fscrawler    |  at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:823) ~[elasticsearch-rest-client-7.5.2.jar:7.5.2]
fscrawler    |  at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248) ~[elasticsearch-rest-client-7.5.2.jar:7.5.2]
fscrawler    |  at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235) ~[elasticsearch-rest-client-7.5.2.jar:7.5.2]
fscrawler    |  at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1514) ~[elasticsearch-rest-high-level-client-7.5.2.jar:7.5.2]
fscrawler    |  at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1499) ~[elasticsearch-rest-high-level-client-7.5.2.jar:7.5.2]
fscrawler    |  at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1466) ~[elasticsearch-rest-high-level-client-7.5.2.jar:7.5.2]
fscrawler    |  at org.elasticsearch.client.RestHighLevelClient.info(RestHighLevelClient.java:730) ~[elasticsearch-rest-high-level-client-7.5.2.jar:7.5.2]
fscrawler    |  at fr.pilato.elasticsearch.crawler.fs.client.v7.ElasticsearchClientV7.getVersion(ElasticsearchClientV7.java:169) ~[fscrawler-elasticsearch-client-v7-2.7-SNAPSHOT.jar:?]
fscrawler    |  at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.checkVersion(ElasticsearchClient.java:181) ~[fscrawler-elasticsearch-client-base-2.7-SNAPSHOT.jar:?]
fscrawler    |  at fr.pilato.elasticsearch.crawler.fs.client.v7.ElasticsearchClientV7.start(ElasticsearchClientV7.java:142) ~[fscrawler-elasticsearch-client-v7-2.7-SNAPSHOT.jar:?]
fscrawler    |  at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:257) [fscrawler-cli-2.7-SNAPSHOT.jar:?]
fscrawler    | Caused by: java.net.ConnectException: Connection refused
fscrawler    |  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_222]
fscrawler    |  at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) ~[?:1.8.0_222]
fscrawler    |  at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvent(DefaultConnectingIOReactor.java:174) ~[httpcore-nio-4.4.12.jar:4.4.12]
fscrawler    |  at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.processEvents(DefaultConnectingIOReactor.java:148) ~[httpcore-nio-4.4.12.jar:4.4.12]
fscrawler    |  at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor.execute(AbstractMultiworkerIOReactor.java:351) ~[httpcore-nio-4.4.12.jar:4.4.12]
fscrawler    |  at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.execute(PoolingNHttpClientConnectionManager.java:221) ~[httpasyncclient-4.1.4.jar:4.1.4]
fscrawler    |  at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase$1.run(CloseableHttpAsyncClientBase.java:64) ~[httpasyncclient-4.1.4.jar:4.1.4]
fscrawler    |  at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_222]
fscrawler    | 16:18:14,201 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [myfirstjob] stopped
fscrawler    | 16:18:14,202 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [myfirstjob] stopped
fscrawler exited with code 0

I checked elasticsearch by curl -X GET "localhost:9200"
it is running.

{
  "name" : "es01",
  "cluster_name" : "es-docker-cluster",
  "cluster_uuid" : "2YcD7BIwQf2k110fhvrY8g",
  "version" : {
    "number" : "7.10.1",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "1c34507e66d7db1211f66f3513706fdf548736aa",
    "build_date" : "2020-12-05T01:00:33.671820Z",
    "build_snapshot" : false,
    "lucene_version" : "8.7.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

dadoonet · January 7, 2021, 4:59pm

I guess you check this from the host where FSCrawler is also running, right?

What is your FSCrawler config for the job?
Could you start FSCrawler with the --debug option?

How do you launch it?

vikram_singh · January 7, 2021, 5:41pm

Hi,
It is showing

Recreating fscrawler ... done
Attaching to fscrawler
fscrawler    | 17:39:01,343 INFO  [f.p.e.c.f.c.BootstrapChecks] Memory [Free/Total=Percent]: HEAP [191.5mb/3gb=6.14%], RAM [5.7gb/13.7gb=42.02%], Swap [1.9gb/1.9gb=100.0%].
fscrawler    | 17:39:01,350 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings.json] already exists
fscrawler    | 17:39:01,350 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings_folder.json] already exists
fscrawler    | 17:39:01,350 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_settings.json] already exists
fscrawler    | 17:39:01,351 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_settings_folder.json] already exists
fscrawler    | 17:39:01,351 DEBUG [f.p.e.c.f.c.FsCrawlerCli] Starting job [myfirstjob]...
fscrawler    | 17:39:01,655 DEBUG [f.p.e.c.f.c.ElasticsearchClientUtil] Trying to find a client version 7
fscrawler    | 17:39:02,078 WARN  [f.p.e.c.f.c.v.ElasticsearchClientV7] failed to create elasticsearch client, disabling crawler...
fscrawler    | 17:39:02,078 FATAL [f.p.e.c.f.c.FsCrawlerCli] We can not start Elasticsearch Client. Exiting.
fscrawler    | java.net.ConnectException: Connection refused

dadoonet · January 7, 2021, 11:49pm

Could you answer to all my questions please?

vikram_singh · January 11, 2021, 12:03pm

Hi,
Above logs was after --debug.
Yes you were right, issue was due to location of fscrawler config path.
Now it is setup correctly and working with sample txt file.

I want to crawl sharepoint files data from fscrawler(it is setup on docker) is it possible or any elasticsearch plugin for sharepoint file crawl.

dadoonet · January 11, 2021, 12:17pm

If you can mount the sharepoint drive, then FSCrawler can probably index the content.
Otherwise, I'd recommend looking at workplace search which has a lot of connectors.

vikram_singh · January 12, 2021, 5:52am

Hi,
Workplace search starts with open source licence or need a minimum elastic licence subscription.
Can it work with elastic basic licence.

dadoonet · January 12, 2021, 6:11am

It starts from basic license.

vikram_singh · January 22, 2021, 6:30pm

Hi,
issue on fscrawler side.
It was working correctly earlier, now I creating mysecondjob and it is start giving error.

dadoonet · January 22, 2021, 7:23pm

Please don't post images of text as they are hard to read, may not display correctly for everyone, and are not searchable.

Instead, paste the text and format it with </> icon or pairs of triple backticks (```), and check the preview window to make sure it's properly formatted before posting it. This makes it more likely that your question will receive a useful answer.

It would be great if you could update your post to solve this.

Please provide the full logs. And run it with --trace option please.

vikram_singh · January 22, 2021, 7:42pm

Hi,
below log is after --trace.

Starting fscrawler ... done
Attaching to fscrawler
fscrawler           | 19:40:55,500 INFO  [f.p.e.c.f.c.BootstrapChecks] Memory [Free/Total=Percent]: HEAP [191.5mb/3gb=6.14%], RAM [7.9gb/13.7gb=57.87%], Swap [1.9gb/1.9gb=100.0%].
fscrawler           | 19:40:55,506 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings.json] already exists
fscrawler           | 19:40:55,506 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings_folder.json] already exists
fscrawler           | 19:40:55,506 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_settings.json] already exists
fscrawler           | 19:40:55,506 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_settings_folder.json] already exists
fscrawler           | 19:40:55,507 DEBUG [f.p.e.c.f.c.FsCrawlerCli] Starting job [mysecondjob]...
fscrawler           | 19:40:55,508 WARN  [f.p.e.c.f.c.FsCrawlerCli] job [mysecondjob] does not exist
fscrawler           | 19:40:55,508 INFO  [f.p.e.c.f.c.FsCrawlerCli] Do you want to create it (Y/N)?
fscrawler           | Exception in thread "main" java.util.NoSuchElementException
fscrawler           |   at java.util.Scanner.throwFor(Scanner.java:862)
fscrawler           |   at java.util.Scanner.next(Scanner.java:1371)
fscrawler           |   at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:225)
fscrawler exited with code 1

vikram_singh · January 25, 2021, 6:51am

Hi David,
I got your suggestion for a proper format of posting logs.

And for providing the full logs, below are the full logs from fscrawler starting point.

[elasticuser@elasticsearch-centos docker-elk-master]$ docker-compose up fscrawler
Starting fscrawler ... done
Attaching to fscrawler
fscrawler           | 06:46:41,375 INFO  [f.p.e.c.f.c.BootstrapChecks] Memory [Free/Total=Percent]: HEAP [191.6mb/3gb=6.14%], RAM [8.1gb/13.7gb=59.8%], Swap [1.9gb/1.9gb=100.0%].
fscrawler           | 06:46:41,382 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings.json] already exists
fscrawler           | 06:46:41,382 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings_folder.json] already exists
fscrawler           | 06:46:41,382 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_settings.json] already exists
fscrawler           | 06:46:41,382 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [7/_settings_folder.json] already exists
fscrawler           | 06:46:41,382 DEBUG [f.p.e.c.f.c.FsCrawlerCli] Starting job [mysecondjob]...
fscrawler           | 06:46:41,383 WARN  [f.p.e.c.f.c.FsCrawlerCli] job [mysecondjob] does not exist
fscrawler           | 06:46:41,384 INFO  [f.p.e.c.f.c.FsCrawlerCli] Do you want to create it (Y/N)?
fscrawler           | Exception in thread "main" java.util.NoSuchElementException
fscrawler           |   at java.util.Scanner.throwFor(Scanner.java:862)
fscrawler           |   at java.util.Scanner.next(Scanner.java:1371)
fscrawler           |   at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:225)
fscrawler exited with code 1
[elasticuser@elasticsearch-centos docker-elk-master]$

This is my docker-compose.yml file configuration for fscrawler.

version: '3.2'

services:

  fscrawler:
    image: toto1310/fscrawler
    container_name: fscrawler
    volumes:
      - /config:/root/.fscrawler
      - /data:/tmp/es
    ports:
      - 8080:8080
    networks:
      - elastic
    environment:
      - FS_URL=/tmp/es
      - ELASTICSEARCH_URL=http://es01:9200
      - ELASTICSEARCH_INDEX=mysecondjob
    command: fscrawler mysecondjob --trace

dadoonet · January 27, 2021, 11:41am

I see. You are using the FSCrawler Docker image which is probably an old one.

Any chance you could either rebuild a new version by yourself? There's a pending PR to have a proper docker image but it's not yet there.

Or don't use that docker image but a normal installation?

vikram_singh · January 27, 2021, 12:08pm

Hi,
my complete environment is on docker. Any FSCrawler Docker latest image which support elasticsearch version 7.10.

dadoonet · January 27, 2021, 2:25pm

I think this image has been built last time a year ago: https://hub.docker.com/r/toto1310/fscrawler

Mine is also one year old: https://hub.docker.com/repository/docker/dadoonet/fscrawler

But as I said, there is a work pending here:

It does not work yet sadly.
If you are familiar with Docker, and want to help there, please do

system · February 24, 2021, 2:25pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
FSCrawler not running Elasticsearch	11	2906	March 28, 2018
Unable to send data using FSCrawler to ElasticSearch Elasticsearch	15	3176	August 6, 2018
Failed to create elasticsearch client Elasticsearch	18	3513	December 18, 2018
Errors while indexing mounted drive using fscrawler Elasticsearch	5	1380	May 15, 2020
Fscrawler does not index to ES with https Elasticsearch	4	1055	October 27, 2020

Fscrawler

Related topics