Hello,
I want to index PDF file data in Elastic App Search and search data from PDF file.
Please suggested is it possible to handle attachment PDF in Elastic App search?
Regards
Hello,
I want to index PDF file data in Elastic App Search and search data from PDF file.
Please suggested is it possible to handle attachment PDF in Elastic App search?
Regards
I might add this feature to FSCrawler although I'm unsure how useful this could be.
What is the use case?
Note that FSCrawler supports workplace search.
Anyway, what you can do is to use the ingest attachment plugin and the ingest simulate API.
Thank you for support!
But can you please suggest how to install or implement ingest attachment plugin and the ingest simulate API with Elastic App Search?
I have Elastic Enterprise Search Stander account.
Are you running on cloud or locally ?
I am using aws Asia Pacific (Tokyo) on cloud.
So you need to add the ingest attachment plugin
Click on Settings and Plugins:
Add the plugin. And don't forget to save the changes.
After the cluster has been updated, you will be able to use the Elasticsearch endpoint to call the _simulate
API. See Simulate pipeline API | Elasticsearch Reference [7.11] | Elastic
If you mix that with the plugin documentation, you should be able to execute something like:
# Create the pipeline
PUT _ingest/pipeline/attachment
{
"description" : "Extract attachment information",
"processors" : [
{
"attachment" : {
"field" : "data"
}
}
]
}
Use the simulate endpoint
POST /_ingest/pipeline/attachment/_simulate
{
"docs": [
{
"_index": "index",
"_id": "id",
"_source": {
"data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
}
}
]
}
This will give you an output. Use that content to build your own JSON and send that to AppSearch.
Note that e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=
is a BASE64 encoding of a binary file. (A text file here).
Thank you for update! Let me try this.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.