Index pdf files to AWS Elasticsearch service using Elasticsearch File System Crawler

I can index pdf files to a local Elasticsearch using Elasticsearch File System Crawler. The default, fscrawler setting has port, host and scheme parameters as shown below.

{
  "name" : "job_name2",
  "fs" : {
    "url" : "/tmp/es",
    "update_rate" : "15m",
    "excludes" : [ "~*" ],
    "json_support" : false,
    "filename_as_id" : false,
    "add_filesize" : true,
    "remove_deleted" : true,
    "add_as_inner_object" : false,
    "store_source" : false,
    "index_content" : true,
    "attributes_support" : false,
    "raw_metadata" : true,
    "xml_support" : false,
    "index_folders" : true,
    "lang_detect" : false,
    "continue_on_error" : false,
    "pdf_ocr" : true,
    "ocr" : {
      "language" : "eng"
    }
  },
  "elasticsearch" : {
    "nodes" : [ {
      "host" : "127.0.0.1",
      "port" : 9200,
      "scheme" : "HTTP"
    } ],
    "bulk_size" : 100,
    "flush_interval" : "5s"
  },
  "rest" : {
    "scheme" : "HTTP",
    "host" : "127.0.0.1",
    "port" : 8080,
    "endpoint" : "fscrawler"
  }
}

However, I have difficulty using it to index to AWS elasticsearch service because to index to AWS elasticsearch, I have to provide the AWS_ACCESS_KEY, AWS_SECRET_KEY, region, and service as documented here. Any help on how to index pdf files to AWS elasticsearch service is highly appreciated.

I know I tested it with the official cloud by elastic offer (see below).
I don't know how AWS service works exactly but I guess you can have a username and password?
In which case you can define them in the elasticsearch.nodes setting?

See https://github.com/dadoonet/fscrawler#elasticsearch-settings for more details.


BTW did you look at https://www.elastic.co/cloud and https://aws.amazon.com/marketplace/pp/B01N6YCISK ?

Cloud by elastic is the only way to have access to X-Pack. Think about what is there yet like Security, Monitoring, Reporting and what is coming like Canvas, SQL...

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.