Significant overhead of concurrency control for small docs

Hello,

I'm batch-indexing lots of small documents (200-250 bytes each, basically its DB rows) and I see a significant overhead of concurrency control (2-3 times greater than (quite simple) indexing itself):

Indexing performance compared to solr (all test conditions are equal, looks like solr does not perform any version control):

My questions are:

  1. Should't it be considered as a performance problem? Possibly there could be a solution which is based on performing loadCurrentVersionFromIndex in batches.
  2. Given that I properly orchestrate clients and prevent concurrent updates, I suppose I could disable concurrency control for batch inserts by patching InternalEngine.java and providing some API. Am I right? What am I missing?

More info: indexing in batches with size 10 000, using TransportClient; bottleneck is CPU cycles (memory and disks are fine); GC activity is minimal;

Thanks!

This is something that we've been thinking about for a while. We started working on it also:

That's all to say that things will only get better with the next versions. The latter improvement will go out with 5.0.0.

1 Like

Thanks for your reply, Luca.
Nice feature, I see even in 2.3.5 one could leverage autogenerated IDs to bypass version control.
Unfortunately, autogenerated IDs and append-only is not an option for me. In my case append-only stage is transient :wink: