I have been spending sometime with Elastic search Ingest Attachment Processor Plugin.
Here is my data volume :
Approximately 100000 xmls
Approximately 100000 other docs like word PDF and other Tika supported format.
Will base64 binay encoding will be a good idea for this or shall I consider COBR.
Will single index for everything is a good idea. If yes what should I consider for sharding my data.
I am planning to stick with 5 partion. Can I have custom logic to distribute the data in partition or elastic search will do efficiently.
I have large documents and index whole documents making the system slow. What option elastic search provided to deal with that.
Looking forward to some interesting replies.
I would appreciate if someone can share some production experience.