I'm batch-indexing lots of small documents (200-250 bytes each, basically its DB rows) and I see a significant overhead of concurrency control (2-3 times greater than (quite simple) indexing itself):
Indexing performance compared to solr (all test conditions are equal, looks like solr does not perform any version control):
My questions are:
- Should't it be considered as a performance problem? Possibly there could be a solution which is based on performing loadCurrentVersionFromIndex in batches.
- Given that I properly orchestrate clients and prevent concurrent updates, I suppose I could disable concurrency control for batch inserts by patching InternalEngine.java and providing some API. Am I right? What am I missing?
More info: indexing in batches with size 10 000, using TransportClient; bottleneck is CPU cycles (memory and disks are fine); GC activity is minimal;