Input Elasticshearch

Hello there!

I would like to index pptx documents.
I install Elasticsarch and Ingest Attachment plugin. I've stored my document in a directory on local.
Now I don't know how to import my directory as an input of Elasticsearch...

You need to transform your document to BASE64 then send that as a field in your json document.

PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information",
  "processors" : [
    {
      "attachment" : {
        "field" : "data"
      }
    }
  ]
}
PUT my_index/_doc/my_id?pipeline=attachment
{
  "data": "YOUR BASE 64 content here"
}
GET my_index/_doc/my_id

Have also a look at FSCrawler where you can do something like:

curl -F "file=@myfile.pptx" "http://127.0.0.1:8080/fscrawler/_upload"

Or just let it crawl your local dir.

I would like to use open semantic search etl. It could be an agent wich send my documents to Elasticsearch?