Indexing the document is recommend approach


Is it recommended approach to store documents on elastic search?

what if we extract keywords from document and creating index on those keywords( assumption is file path will stored index along with keywords physical file not existing on elastic search) instead of applying index on the document.

Yes. If you don't need to index the whole document but you have access to the keywords and that's the only text you want to be able to search for, then that's probably right.

Thanks for your quick response.

Here i am providing some more details on my requirement, can you please suggest which is better option.

I am looking for index on whole document.

Actually my requirement is to store files may be in size of GB.

My allowed file extensions will be pdf,docx,excel,csv,ppt,pptx, images(png,jpeg,tiff,svg,etc..),audio files (search on audio content),video files,..etc.

can you please suggest me based on above requirement, which is the better option to go with.

thanks in advance.

Siva A.

My advice is to index the content and not to store the binary version of the document.

Did you look at FSCrawler project by any chance?

Thanks for information David.
I really appreciate your effort to reply over weekend.

I didn't go through FSC cwrler. I am completely new to elastic search option.

I looking for help to make a decision, how to implement it in better manner.

It's really important to me make a decision on it. System should be stable for next 15 years of span.

FSCrawler is here:

Where is your data coming from?

Files will come from external web application.

I am planing to add ftp server to maintain those files, will save file repository location in elastic search along with crawler data(indexed keywords).

Please correct me if assumption is wrong?

Have a look at the attachment ingest plugin.
It will help you to extract some metadata.

If you dont mind, can you please elaborate it.

Please confirm me on my decision.

I don't think I understood what you're going to do.

I am planing to add ftp server to maintain files, will save file repository URL in elastic search along with indexed keywords.

That looks like a good design to me.

Thanks for confirmation.