FSCrawler Index Each Page as a Separate Document

vstevenson · September 19, 2019, 5:12pm

Hi,

Is it possible to changes settings for a job in FSCrawler to index each page of a PDF as a separate document? My understanding is that the entire content of the PDF is stored into one JSON field, can we break it up so that one query for a term will return multiple pages within the same PDF book?

If it's not possible, can we make changes in ElasticSearch to show the multiple occurrences of a term within the same document?

Thanks!

dadoonet · September 20, 2019, 6:58am

You are looking for this:

It's not implemented.

system · October 18, 2019, 6:58am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fs-crawler for data scraping Elasticsearch	2	285	April 27, 2022
What should fscrawler mapping look like to index each pdf document as a single unit of text? Elasticsearch	3	607	October 9, 2019
Fscrawler/Elasticsearch page by page indexing Elasticsearch	6	705	July 26, 2019
Fsccrawler Document ID Elasticsearch	2	321	April 15, 2020
FSCrawler Question Elasticsearch	7	3088	March 17, 2017

FSCrawler Index Each Page as a Separate Document

Related topics