Hello there!
I would like to index pptx documents.
I install Elasticsarch and Ingest Attachment plugin. I've stored my document in a directory on local.
Now I don't know how to import my directory as an input of Elasticsearch...
Hello there!
I would like to index pptx documents.
I install Elasticsarch and Ingest Attachment plugin. I've stored my document in a directory on local.
Now I don't know how to import my directory as an input of Elasticsearch...
You need to transform your document to BASE64 then send that as a field in your json document.
PUT _ingest/pipeline/attachment
{
"description" : "Extract attachment information",
"processors" : [
{
"attachment" : {
"field" : "data"
}
}
]
}
PUT my_index/_doc/my_id?pipeline=attachment
{
"data": "YOUR BASE 64 content here"
}
GET my_index/_doc/my_id
Have also a look at FSCrawler where you can do something like:
curl -F "file=@myfile.pptx" "http://127.0.0.1:8080/fscrawler/_upload"
Or just let it crawl your local dir.
I would like to use open semantic search etl. It could be an agent wich send my documents to Elasticsearch?
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.