Elasticsearch: Best Approach to Index files:

rahulnama · September 11, 2018, 5:58am

Hello Team

I've few files(pdf, and docx) which has question and answers(think of faq's) . size of all the files will be around 500mb.

Expected output: when we search something, it searches in all the docs and gives the relevant answer

What is the best way to index these files-

1. Index page by page using ingest attachment processor- I think we need to maintain the parent-child relation. I'm afraid when we GET something using match query it will return the whole page and we need to parse it after getting the response. and if question is in one page and answer is in other page, I'm not sure how this works?

2. Extract question and answer from files convert to json and index.- extract to text and convert to json having question and answer as keys and index using elasticsearch client. When I have many files, I'm not sure about the time it takes to convert all files to text and then to json. I think this approach is more suitable for current scenario. But I'm not sure. Please suggest

is there any other method that I need to consider?

Thanks for your time as always

Best
Rahul

system · October 9, 2018, 5:58am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Indexing the document is recommend approach Elasticsearch	13	731	March 14, 2020
Search to treat multiple documents as one Elasticsearch	2	663	July 5, 2017
How to index files? Elastic Search	2	236	November 4, 2022
How to index a large file with Elasticsearch Elasticsearch	2	1618	July 5, 2017
Indexing file (.doc,.pdf.xls etc) Elasticsearch	7	2712	July 5, 2017

Elasticsearch: Best Approach to Index files:

Related topics