Input Elasticshearch

Hello there!

I would like to index pptx documents.
I install Elasticsarch and Ingest Attachment plugin. I've stored my document in a directory on local.
Now I don't know how to import my directory as an input of Elasticsearch...

You need to transform your document to BASE64 then send that as a field in your json document.

PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information",
  "processors" : [
    {
      "attachment" : {
        "field" : "data"
      }
    }
  ]
}
PUT my_index/_doc/my_id?pipeline=attachment
{
  "data": "YOUR BASE 64 content here"
}
GET my_index/_doc/my_id

Have also a look at FSCrawler where you can do something like:

curl -F "file=@myfile.pptx" "http://127.0.0.1:8080/fscrawler/_upload"

Or just let it crawl your local dir.

I would like to use open semantic search etl. It could be an agent wich send my documents to Elasticsearch?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.