Fscrawler creating custome mapping

vikas_singhji · February 12, 2019, 4:06am

I'm using FSCrawler 2.6 and it's working great for indexing the pdf document. However one issue i am facing is that it is putting all the contents of the PDF in the "content" field(i am newbie in this field). So, my question is that "is there any way that i can have my custom mapping for the data of pdf i.e. latitude/longitude, number or if not that... line wise(like content.line1, content.line2, content.line3...) ?".

dadoonet · February 12, 2019, 4:35am

Parsing text to extract meaningful content (entities) is a difficult thing.
The only option I can see for now is by using

In FSCrawler you can configure the ingest pipeline name to apply after the text has been extracted. See https://fscrawler.readthedocs.io/en/latest/admin/fs/elasticsearch.html

system · March 12, 2019, 4:35am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch : How to Struct the content of a PDF file read by Ingest Attachment Processor Plugin for PDF Elasticsearch	2	390	April 17, 2019
FSCrawler Question Elasticsearch	7	3083	March 17, 2017
Can I parse text in pdf document before sending it to elasticsearch using FSCrawler Elasticsearch	18	1245	June 23, 2019
Index PDF in ES Elasticsearch	14	9109	April 24, 2017
Regarding Implementation of NLP while crawling using FSCrawler 2.5 Elasticsearch	2	396	June 15, 2019

Fscrawler creating custome mapping

Related topics