Ingestion attachment processor plugin

I have been spending sometime with Elastic search Ingest Attachment Processor Plugin.

Here is my data volume :

Approximately 100000 xmls

Approximately 100000 other docs like word PDF and other Tika supported format.

Will base64 binay encoding will be a good idea for this or shall I consider COBR.

Will single index for everything is a good idea. If yes what should I consider for sharding my data.

I am planning to stick with 5 partion. Can I have custom logic to distribute the data in partition or elastic search will do efficiently.

I have large documents and index whole documents making the system slow. What option elastic search provided to deal with that.

Looking forward to some interesting replies.

I would appreciate if someone can share some production experience.

What is slow? The ingest part? The search part?

If the search part, I'd recommend removing the binary BASE64 content in your ingest pipeline.

Ingestion as well as search both are slow.

Try to remove the binary in the pipeline.
Measure before and after the change to see if it's getting better.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.