Store PDF documents page-wise in ES

fmay · October 26, 2019, 9:49am

We have a business process in which we import a number of PDF-Documents and want to search this PDF Documents per page.

Means..

You import a lot of PDF documents - they should be stored in Elastic-Search
You index every page of every PDF Document
you write a text in a text-box
you start a search
As query-input for the ES-Query you take the text from the input-box
You want to get back some of the PDF pages from the imported document (which matches the context of the text of your text-box)

My question now?
When I import a PDF Document to ES - can I tell ES to extract every page of the PDF Document (index it) - that I can query all my PDF documents per page?

dadoonet · October 26, 2019, 12:20pm

You can index PDF documents with ingest attachment plugin but it does not extract by page but the whole document.

system · November 23, 2019, 12:20pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Possible to Index PDFs by page? Elasticsearch	6	3778	July 6, 2017
Store Text Content of PDF in elastic search Elasticsearch	2	206	August 19, 2023
Indexing PDFs directly Elasticsearch	4	656	October 14, 2019
Searching through PDF attachments and other documents in ElasticSearch with one query Elasticsearch	6	1701	October 29, 2020
How to index and store pdf file in elastic search using spring boot? Elasticsearch	51	12384	April 21, 2020

Store PDF documents page-wise in ES

Related topics