Store PDF documents page-wise in ES

We have a business process in which we import a number of PDF-Documents and want to search this PDF Documents per page.


  1. You import a lot of PDF documents - they should be stored in Elastic-Search
  2. You index every page of every PDF Document
  3. you write a text in a text-box
  4. you start a search
    As query-input for the ES-Query you take the text from the input-box
  5. You want to get back some of the PDF pages from the imported document (which matches the context of the text of your text-box)

My question now?
When I import a PDF Document to ES - can I tell ES to extract every page of the PDF Document (index it) - that I can query all my PDF documents per page?

You can index PDF documents with ingest attachment plugin but it does not extract by page but the whole document.

