Store PDF documents page-wise in ES

We have a business process in which we import a number of PDF-Documents and want to search this PDF Documents per page.

Means..

  1. You import a lot of PDF documents - they should be stored in Elastic-Search
  2. You index every page of every PDF Document
  3. you write a text in a text-box
  4. you start a search
    As query-input for the ES-Query you take the text from the input-box
  5. You want to get back some of the PDF pages from the imported document (which matches the context of the text of your text-box)

My question now?
When I import a PDF Document to ES - can I tell ES to extract every page of the PDF Document (index it) - that I can query all my PDF documents per page?

You can index PDF documents with ingest attachment plugin but it does not extract by page but the whole document.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.