Index pdf files to AWS Elasticsearch service using Elasticsearch File System Crawler


#1

I can index pdf files to a local Elasticsearch using Elasticsearch File System Crawler. The default, fscrawler setting has port, host and scheme parameters as shown below.

{
  "name" : "job_name2",
  "fs" : {
    "url" : "/tmp/es",
    "update_rate" : "15m",
    "excludes" : [ "~*" ],
    "json_support" : false,
    "filename_as_id" : false,
    "add_filesize" : true,
    "remove_deleted" : true,
    "add_as_inner_object" : false,
    "store_source" : false,
    "index_content" : true,
    "attributes_support" : false,
    "raw_metadata" : true,
    "xml_support" : false,
    "index_folders" : true,
    "lang_detect" : false,
    "continue_on_error" : false,
    "pdf_ocr" : true,
    "ocr" : {
      "language" : "eng"
    }
  },
  "elasticsearch" : {
    "nodes" : [ {
      "host" : "127.0.0.1",
      "port" : 9200,
      "scheme" : "HTTP"
    } ],
    "bulk_size" : 100,
    "flush_interval" : "5s"
  },
  "rest" : {
    "scheme" : "HTTP",
    "host" : "127.0.0.1",
    "port" : 8080,
    "endpoint" : "fscrawler"
  }
}

However, I have difficulty using it to index to AWS elasticsearch service because to index to AWS elasticsearch, I have to provide the AWS_ACCESS_KEY, AWS_SECRET_KEY, region, and service as documented here. Any help on how to index pdf files to AWS elasticsearch service is highly appreciated.


(David Pilato) #2

I know I tested it with the official cloud by elastic offer (see below).
I don't know how AWS service works exactly but I guess you can have a username and password?
In which case you can define them in the elasticsearch.nodes setting?

See https://github.com/dadoonet/fscrawler#elasticsearch-settings for more details.


BTW did you look at https://www.elastic.co/cloud and https://aws.amazon.com/marketplace/pp/B01N6YCISK ?

Cloud by elastic is the only way to have access to X-Pack. Think about what is there yet like Security, Monitoring, Reporting and what is coming like Canvas, SQL...


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.