FSCrawler-Unable to parse response body for Response

My Elastic search (7.4.2) is running on docker container on port 9201 and I am trying to start fscrawler on the same, but its resulting in the below issue.

   $ bin/fscrawler conf_test --debug=true
    07:20:53,773 WARN  [f.p.e.c.f.c.ElasticsearchClientManager] failed to create elasticsearch client, disabling crawler...
    07:20:53,779 FATAL [f.p.e.c.f.c.FsCrawler] Fatal error received while running the crawler: [Unable to parse response body for Response{requestLine=GET / HTTP/1.1, host=http://dbsld0127:9201, response=HTTP/1.1 200 OK}]
    07:20:53,782 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [conf_test] stopped
    07:20:53,783 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [conf_test] stopped

Welcome!

Could you run the same command with --debug instead of --debug=true?
Also make sure you are using the latest snapshot version.

Hi David - Yes I am using 2.5 version of fscrawler with 7.x version of _settings.json .

Below is the debug log.

$ bin/fscrawler conf_test --debug
08:04:48,863 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [2/_settings.json] already exists
08:04:48,867 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [2/_settings_folder.json] already exists
08:04:48,867 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [5/_settings.json] already exists
08:04:48,867 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [5/_settings_folder.json] already exists
08:04:48,867 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings.json] already exists
08:04:48,867 DEBUG [f.p.e.c.f.f.FsCrawlerUtil] Mapping [6/_settings_folder.json] already exists
08:04:48,868 DEBUG [f.p.e.c.f.c.FsCrawler] Starting job [conf_test]...
08:04:49,497 WARN  [f.p.e.c.f.c.ElasticsearchClientManager] failed to create elasticsearch client, disabling crawler...
08:04:49,497 FATAL [f.p.e.c.f.c.FsCrawler] Fatal error received while running the crawler: [Unable to parse response body for Response{requestLine=GET / HTTP/1.1, host=http://XXXXX:9201, response=HTTP/1.1 200 OK}]
08:04:49,497 DEBUG [f.p.e.c.f.c.FsCrawler] error caught
java.io.IOException: Unable to parse response body for Response{requestLine=GET / HTTP/1.1, host=http://XXXXX:9201, response=HTTP/1.1 200 OK}
        at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:541) ~[elasticsearch-rest-high-level-client-6.3.2.jar:6.3.2]
        at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:508) ~[elasticsearch-rest-high-level-client-6.3.2.jar:6.3.2]
        at org.elasticsearch.client.RestHighLevelClient.info(RestHighLevelClient.java:283) ~[elasticsearch-rest-high-level-client-6.3.2.jar:6.3.2]
        at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClient.setElasticsearchBehavior(ElasticsearchClient.java:291) ~[fscrawler-elasticsearch-client-2.5.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.client.ElasticsearchClientManager.start(ElasticsearchClientManager.java:90) ~[fscrawler-elasticsearch-client-2.5.jar:?]
        at fr.pilato.elasticsearch.crawler.fs.cli.FsCrawler.main(FsCrawler.java:260) [fscrawler-cli-2.5.jar:?]
Caused by: org.elasticsearch.common.xcontent.XContentParseException: [15:3] [org.elasticsearch.action.main.MainResponse] failed to parse field [version]
        at org.elasticsearch.common.xcontent.ObjectParser.parseValue(ObjectParser.java:316) ~[elasticsearch-x-content-6.3.2.jar:6.3.2]
        at org.elasticsearch.common.xcontent.ObjectParser.parseSub(ObjectParser.java:326) ~[elasticsearch-x-content-6.3.2.jar:6.3.2]
        at org.elasticsearch.common.xcontent.ObjectParser.parse(ObjectParser.java:168) ~[elasticsearch-x-content-6.3.2.jar:6.3.2]
        at org.elasticsearch.common.xcontent.ObjectParser.apply(ObjectParser.java:182) ~[elasticsearch-x-content-6.3.2.jar:6.3.2]
        at org.elasticsearch.action.main.MainResponse.fromXContent(MainResponse.java:150) ~[elasticsearch-6.3.2.jar:6.3.2]
        at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:653) ~[elasticsearch-rest-high-level-client-6.3.2.jar:6.3.2]
        at org.elasticsearch.client.RestHighLevelClient.lambda$performRequestAndParseEntity$2(RestHighLevelClient.java:508) ~[elasticsearch-rest-high-level-client-6.3.2.jar:6.3.2]
        at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:539) ~[elasticsearch-rest-high-level-client-6.3.2.jar:6.3.2]
        ... 5 more
Caused by: java.lang.IllegalStateException: unexpected distribution type [docker]; your distribution is broken
        at org.elasticsearch.Build$Type.fromDisplayName(Build.java:106) ~[elasticsearch-6.3.2.jar:6.3.2]
        at org.elasticsearch.action.main.MainResponse.lambda$static$4(MainResponse.java:140) ~[elasticsearch-6.3.2.jar:6.3.2]
        at org.elasticsearch.common.xcontent.ObjectParser.lambda$declareField$1(ObjectParser.java:213) ~[elasticsearch-x-content-6.3.2.jar:6.3.2]
        at org.elasticsearch.common.xcontent.ObjectParser.parseValue(ObjectParser.java:314) ~[elasticsearch-x-content-6.3.2.jar:6.3.2]
        at org.elasticsearch.common.xcontent.ObjectParser.parseSub(ObjectParser.java:326) ~[elasticsearch-x-content-6.3.2.jar:6.3.2]
        at org.elasticsearch.common.xcontent.ObjectParser.parse(ObjectParser.java:168) ~[elasticsearch-x-content-6.3.2.jar:6.3.2]
        at org.elasticsearch.common.xcontent.ObjectParser.apply(ObjectParser.java:182) ~[elasticsearch-x-content-6.3.2.jar:6.3.2]
        at org.elasticsearch.action.main.MainResponse.fromXContent(MainResponse.java:150) ~[elasticsearch-6.3.2.jar:6.3.2]
        at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:653) ~[elasticsearch-rest-high-level-client-6.3.2.jar:6.3.2]
        at org.elasticsearch.client.RestHighLevelClient.lambda$performRequestAndParseEntity$2(RestHighLevelClient.java:508) ~[elasticsearch-rest-high-level-client-6.3.2.jar:6.3.2]
        at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:539) ~[elasticsearch-rest-high-level-client-6.3.2.jar:6.3.2]
        ... 5 more
08:04:49,501 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [conf_test]
08:04:49,501 DEBUG [f.p.e.c.f.c.ElasticsearchClientManager] Closing Elasticsearch client manager
08:04:49,501 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Closing REST client
08:04:49,503 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
08:04:49,503 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [conf_test] stopped
08:04:49,504 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [conf_test]
08:04:49,504 DEBUG [f.p.e.c.f.c.ElasticsearchClientManager] Closing Elasticsearch client manager
08:04:49,504 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Closing REST client
08:04:49,504 DEBUG [f.p.e.c.f.FsCrawlerImpl] ES Client Manager stopped
08:04:49,504 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler [conf_test] stopped

For Reference:

Below is are the settings used as my elastic version is 7.4.2

_settings.json

{
  "settings": {
    "number_of_shards": 1,
    "index.mapping.total_fields.limit": 2000,
    "analysis": {
      "analyzer": {
        "fscrawler_path": {
          "tokenizer": "fscrawler_path"
        }
      },
      "tokenizer": {
        "fscrawler_path": {
          "type": "path_hierarchy"
        }
      }
    }
  },
  "mappings": {
    "dynamic_templates": [
      {
        "raw_as_text": {
          "path_match": "meta.raw.*",
          "mapping": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }
        }
      }
    ],
    "properties": {
      "attachment": {
        "type": "binary",
        "doc_values": false
      },
      "attributes": {
        "properties": {
          "group": {
            "type": "keyword"
          },
          "owner": {
            "type": "keyword"
          }
        }
      },
      "content": {
        "type": "text"
      },
      "file": {
        "properties": {
          "content_type": {
            "type": "keyword"
          },
          "filename": {
            "type": "keyword",
            "store": true
          },
          "extension": {
            "type": "keyword"
          },
          "filesize": {
            "type": "long"
          },
          "indexed_chars": {
            "type": "long"
          },
          "indexing_date": {
            "type": "date",
            "format": "dateOptionalTime"
          },
          "created": {
            "type": "date",
            "format": "dateOptionalTime"
          },
          "last_modified": {
            "type": "date",
            "format": "dateOptionalTime"
          },
          "last_accessed": {
            "type": "date",
            "format": "dateOptionalTime"
          },
          "checksum": {
            "type": "keyword"
          },
          "url": {
            "type": "keyword",
            "index": false
          }
        }
      },
      "meta": {
        "properties": {
          "author": {
            "type": "text"
          },
          "date": {
            "type": "date",
            "format": "dateOptionalTime"
          },
          "keywords": {
            "type": "text"
          },
          "title": {
            "type": "text"
          },
          "language": {
            "type": "keyword"
          },
          "format": {
            "type": "text"
          },
          "identifier": {
            "type": "text"
          },
          "contributor": {
            "type": "text"
          },
          "coverage": {
            "type": "text"
          },
          "modifier": {
            "type": "text"
          },
          "creator_tool": {
            "type": "keyword"
          },
          "publisher": {
            "type": "text"
          },
          "relation": {
            "type": "text"
          },
          "rights": {
            "type": "text"
          },
          "source": {
            "type": "text"
          },
          "type": {
            "type": "text"
          },
          "description": {
            "type": "text"
          },
          "created": {
            "type": "date",
            "format": "dateOptionalTime"
          },
          "print_date": {
            "type": "date",
            "format": "dateOptionalTime"
          },
          "metadata_date": {
            "type": "date",
            "format": "dateOptionalTime"
          },
          "latitude": {
            "type": "text"
          },
          "longitude": {
            "type": "text"
          },
          "altitude": {
            "type": "text"
          },
          "rating": {
            "type": "byte"
          },
          "comments": {
            "type": "text"
          }
        }
      },
      "path": {
        "properties": {
          "real": {
            "type": "keyword",
            "fields": {
              "tree": {
                "type": "text",
                "analyzer": "fscrawler_path",
                "fielddata": true
              },
              "fulltext": {
                "type": "text"
              }
            }
          },
          "root": {
            "type": "keyword"
          },
          "virtual": {
            "type": "keyword",
            "fields": {
              "tree": {
                "type": "text",
                "analyzer": "fscrawler_path",
                "fielddata": true
              },
              "fulltext": {
                "type": "text"
              }
            }
          }
        }
      }
    }
  }
}

_settings_folder.json

{
  "settings": {
    "analysis": {
      "analyzer": {
        "fscrawler_path": {
          "tokenizer": "fscrawler_path"
        }
      },
      "tokenizer": {
        "fscrawler_path": {
          "type": "path_hierarchy"
        }
      }
    }
  },
  "mappings": {
    "properties" : {
      "real" : {
        "type" : "keyword",
        "store" : true
      },
      "root" : {
        "type" : "keyword",
        "store" : true
      },
      "virtual" : {
        "type" : "keyword",
        "store" : true
      }
    }
  }
}

David - As per the logs provided above, fscrawler is not able to ready the _settings.json file which I had created under _default directory.

I am not sure what is happening.

$ ll
total 0
drwxr-xr-x 2 xxx dce 55 Apr  2 03:18 2
drwxr-xr-x 2 xxx dce 55 Apr  2 03:18 5
drwxr-xr-x 2 xxx dce 55 Apr  2 03:18 6
drwxrwxrwx 2 xxx dce 55 Apr  2 08:03 7
[xxx@xxxxxx _default]$ cd 7
[xxx@xxxxxx 7]$ ll
total 12
-rwxrwxr-x 1 xxx dce  538 Apr  2 05:50 _settings_folder.json
-rwxrwxr-x 1 xxx dce 4676 Apr  2 05:50 _settings.json
[xxx@xxxxxx 7]$

Please switch to 2.7-SNAPSHOT. Download it from https://fscrawler.readthedocs.io/en/latest/installation.html which links to https://oss.sonatype.org/content/repositories/snapshots/fr/pilato/elasticsearch/crawler/fscrawler-es7/2.7-SNAPSHOT/

Pick there the latest version.

Thanks a lot David. It worked.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.