Index mapping with fscrawler

Denn · September 19, 2019, 3:15pm

@dadoonet How do I correctly map an index with fscrawler so that pdf, doc and rtf files open on screen in the same format(visually) as they are on the original document?

(Please note that the documents are of various formats .. some have paragraphs while others are tables etc)

dadoonet · September 20, 2019, 7:19am

You can activate this option (https://fscrawler.readthedocs.io/en/latest/admin/fs/local-fs.html#store-binary) which will store the original binary content in elasticsearch.
That being said, I don't recommend this option. It's better to have the URL of the source document and load the binary from its source than storing big blobs in Elasticsearch/Lucene which has not really being designed for that purpose.

Not sure this is what you are looking after though.

system · October 18, 2019, 7:19am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fscrawler save binary , base64 Elasticsearch	7	592	April 18, 2018
Recommended workflow for indexing many binary docs Elasticsearch	4	810	July 6, 2021
Index binary files Elasticsearch	4	366	July 6, 2017
How to index a file with elasticsearch 5.5.1 Elasticsearch	22	8049	September 1, 2017
Indexing word, pdf documents? Elasticsearch	12	6910	July 7, 2020

Index mapping with fscrawler

Related topics