Mapper attachment plugin vs. pre-parsing and extracting content from binary files

tlovett1 · February 2, 2017, 10:13pm

In order to search binary files, I have two options that I see:

Use the Elastic Mapping Plugin and have Elasticsearch handle all the things.
Pre-parse and extract content from binary file. Send parsed content to Elasticsearch.

Number 2 seems like a perfectly viable solution. What's the advantage of using the mapper plugin?

dadoonet · February 3, 2017, 7:47am

Mapper plugin is removed in 6.0. Ingest attachment should be used instead.

But TBH I prefer your solution number 2. It's what I do in FSCrawler project.

David_Pocivalnik · February 3, 2017, 12:02pm

Can you elaborate on why you would choose to parse the documents yourself and not let the ingest-attachment plug-in do the work?

dadoonet · February 3, 2017, 12:24pm

Mainly because of some jar conflicts (jarhell checks) we had to reduce the surface of what actually Tika can extract (supported files).
So if you prefer having a full support of all supported files by Tika, doing that externally will help.

Also, some advanced features like using Tesseract OCR are not be possible with ingest-attachment plugin.

tlovett1 · February 3, 2017, 3:17pm

Awesome thanks. Would love to hear from someone from Elastic on this.

dadoonet · February 3, 2017, 3:29pm

lol

http://david.pilato.fr/blog/2017/01/09/4-years-at-elastic/

tlovett1 · February 3, 2017, 3:50pm

Sorry about that! Did not read your profile.

tlovett1 · February 3, 2017, 3:51pm

David, are there any advantages to the first approach?

dadoonet · February 3, 2017, 4:01pm

No worries! Was funny to read

dadoonet · February 3, 2017, 4:02pm

The main advantage is that you don't write/maintain the code.

If you are using ingest-attachment instead of mapper-attachments (removed in 6.0), another advantage is that you can dedicate some nodes as ingest nodes and then share the load on multiple nodes.

tlovett1 · February 3, 2017, 4:04pm

Awesome. Totally makes sense. Thanks.

David_Pocivalnik · February 6, 2017, 8:14am

thanks for your input!

system · March 6, 2017, 8:14am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Mapper-attachment vs Ingest-attachment with OCR Elasticsearch	3	1764	December 13, 2016
Best way to use Ingest Attachment Plugin Elasticsearch ingest-pipeline	4	514	December 31, 2021
Advantages of base64 encoded content in ingest attachment plugin Elasticsearch	3	1571	May 1, 2018
Getting the extracted content from the attachment mapper plugin Elasticsearch	6	388	July 6, 2017
Large file chunking with Ingest-Attachment Elasticsearch	2	1247	December 14, 2020

Mapper attachment plugin vs. pre-parsing and extracting content from binary files

Related topics