Hello there!
I would like to index pptx documents.
I install Elasticsarch and Ingest Attachment plugin. I've stored my document in a directory on local.
Now I don't know how to import my directory as an input of Elasticsearch...
Hello there!
I would like to index pptx documents.
I install Elasticsarch and Ingest Attachment plugin. I've stored my document in a directory on local.
Now I don't know how to import my directory as an input of Elasticsearch...
You need to transform your document to BASE64 then send that as a field in your json document.
PUT _ingest/pipeline/attachment
{
"description" : "Extract attachment information",
"processors" : [
{
"attachment" : {
"field" : "data"
}
}
]
}
PUT my_index/_doc/my_id?pipeline=attachment
{
"data": "YOUR BASE 64 content here"
}
GET my_index/_doc/my_id
Have also a look at FSCrawler where you can do something like:
curl -F "file=@myfile.pptx" "http://127.0.0.1:8080/fscrawler/_upload"
Or just let it crawl your local dir.
I would like to use open semantic search etl. It could be an agent wich send my documents to Elasticsearch?
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.