I am looking into handrolling a large pdf document search via Elastic Search. I am looking into Apache Tika for parsing and then indexing it via Elastic Search. The question is, if I have to locate the specific sections within the pdf - how would I go about it ? My thinking is I would need to break the pdf down into multiple sections before indexing. Appreciate any pointers, if there are any plugins available.
There's this plugin that will attempt to extract
content_length , and
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.