Ingestion attachment processor plugin

Zamir_Arif · December 12, 2018, 4:49pm

I have been spending sometime with Elastic search Ingest Attachment Processor Plugin.

Here is my data volume :

Approximately 100000 xmls

Approximately 100000 other docs like word PDF and other Tika supported format.

Will base64 binay encoding will be a good idea for this or shall I consider COBR.

Will single index for everything is a good idea. If yes what should I consider for sharding my data.

I am planning to stick with 5 partion. Can I have custom logic to distribute the data in partition or elastic search will do efficiently.

I have large documents and index whole documents making the system slow. What option elastic search provided to deal with that.

Looking forward to some interesting replies.

I would appreciate if someone can share some production experience.

dadoonet · December 12, 2018, 5:06pm

What is slow? The ingest part? The search part?

If the search part, I'd recommend removing the binary BASE64 content in your ingest pipeline.

Zamir_Arif · December 12, 2018, 8:53pm

Ingestion as well as search both are slow.

dadoonet · December 12, 2018, 9:16pm

Try to remove the binary in the pipeline.
Measure before and after the change to see if it's getting better.

system · January 9, 2019, 9:16pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Ingest-geoip plugin performance Elasticsearch	8	1991	January 16, 2017
Don't return whole BASE64 encoded files (ingest plugin) Elasticsearch	2	393	April 2, 2019
Attachment processor in elasticsearch failing to process base64 content Elasticsearch	1	427	September 17, 2018
Large file chunking with Ingest-Attachment Elasticsearch	2	1247	December 14, 2020
Ingest-Attachment performance issue Elasticsearch	3	860	May 1, 2017