FSCrawler not able to connect to elastic search

Hi,

I have an AWS Elastic Search Instance. I am trying to us FSCrawler along with it. My elastic search version is 7.10 and FSCrawler is fscrawler-es7-2.7.zip

I am getting the following error:

17:51:21,439 e[33mWARN e[m [f.p.e.c.f.c.v.ElasticsearchClientV7] failed to create elasticsearch client on Elasticsearch{nodes=[https://search-reghu-personal-3-dlvlsq4wm7qa7fmgrckjtep66y.us-east-2.es.amazonaws.com], index='elasticSearchPoc', indexFolder='elasticSearchPoc_folder', bulkSize=100, flushInterval=5s, byteSize=10mb, username='null', pipeline='null', pathPrefix='null', sslVerification='false'}, disabling crawler...

17:51:21,440 e[1;31mFATALe[m [f.p.e.c.f.c.FsCrawlerCli] We can not start Elasticsearch Client. Exiting.

org.elasticsearch.ElasticsearchException: Invalid or missing build flavor [oss]

at org.elasticsearch.client.RestHighLevelClient.performClientRequest(RestHighLevelClient.java:2084) ~[elasticsearch-rest-high-level-client-7.14.0.jar:7.14.0]

at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1732) ~[elasticsearch-rest-high-level-client-7.14.0.jar:7.14.0]

at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1717) ~[elasticsearch-rest-high-level-client-7.14.0.jar:7.14.0]

at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1684) ~[elasticsearch-rest-high-level-client-7.14.0.jar:7.14.0]

at org.elasticsearch.client.RestHighLevelClient.info(RestHighLevelClient.java:825) ~[elasticsearch-rest-high-level-client-7.14.0.jar:7.14.0]

at fr.pilato.elasticsearch.crawler.fs.client.v7.ElasticsearchClientV7.getVersion(ElasticsearchClientV7.java:180) ~[fscrawler-elasticsearch-client-v7-2.7.jar:?]

at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.checkVersion(ElasticsearchClient.java:193) ~[fscrawler-elasticsearch-client-base-2.7.jar:?]

at fr.pilato.elasticsearch.crawler.fs.client.v7.ElasticsearchClientV7.start(ElasticsearchClientV7.java:153) ~[fscrawler-elasticsearch-client-v7-2.7.jar:?]

at fr.pilato.elasticsearch.crawler.fs.service.FsCrawlerManagementServiceElasticsearchImpl.start(FsCrawlerManagementServiceElasticsearchImpl.java:63) ~[fscrawler-core-2.7.jar:?]

at fr.pilato.elasticsearch.crawler.fs.FsCrawlerImpl.start(FsCrawlerImpl.java:116) ~[fscrawler-core-2.7.jar:?]

at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.startEsClient(FsCrawlerCli.java:322) [fscrawler-cli-2.7.jar:?]

at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawlerCli.main(FsCrawlerCli.java:298) [fscrawler-cli-2.7.jar:?]

My AWS ES instance is ip restricted. I get the following response from the end point when I try to hit it from browser:

{
  "name" : "30cf6100daa7d88d855a9bd1b6cf0cba",
  "cluster_name" : "656830870718:reghu-personal-3",
  "cluster_uuid" : "bAzRn39rTlSWW6yNV2qzIQ",
  "version" : {
    "number" : "7.10.2",
    "build_flavor" : "oss",
    "build_type" : "tar",
    "build_hash" : "unknown",
    "build_date" : "2021-07-08T21:51:39.338286Z",
    "build_snapshot" : false,
    "lucene_version" : "8.7.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

What could be the issue here?

Thanks,
Regs

You're using the AWS fork of Elasticsearch (a.k.a. OpenSearch) which isn't 100% compatible with Elasticsearch and won't work with the latest FSCrawler. Not sure if there's an AWS-compatible fork of FSCrawler, you'll have to speak to the OpenSearch folks about that. If not, you can try using an older version of FSCrawler or else move to Elasticsearch proper (e.g. using Elastic Cloud: Hosted Elasticsearch, Hosted Search | Elastic if you want to stick with a managed service)

1 Like

I can confirm that the current version of FSCrawler needs to run with Elasticsearch 7.14. AFAIK this version is only available on cloud.elastic.co.

3 Likes

Just a little confused. My AWS console says it has Elastic Search Engine

Yes, it's unfortunate and very confusing that AWS Elasticsearch uses the same name for a different thing. That's one of the reasons why they're renaming it.

2 Likes

I was going through open search documentation, AWS says Opensearch is still not available... !
Screenshot 2021-09-07 at 6.53.00 PM|690x70

Short term response. If you want to run FSCrawler with Elasticsearch, look at Cloud by Elastic, also available if needed from AWS Marketplace, Azure Marketplace and Google Cloud Marketplace?

Cloud by elastic is one way to have access to all features, all managed by us. Think about what is there yet like Security, Monitoring, Reporting, SQL, Canvas, Maps UI, Alerting and built-in solutions named Observability, Security, Enterprise Search and what is coming next :slight_smile: ...

You can also run FSCrawler with Workplace Search if you want to have a UI and search not only your local files but also from Google Drive, Dropbox...

3 Likes