Significant overhead of concurrency control for small docs

Zline · September 12, 2016, 11:03am

Hello,

I'm batch-indexing lots of small documents (200-250 bytes each, basically its DB rows) and I see a significant overhead of concurrency control (2-3 times greater than (quite simple) indexing itself):

Indexing performance compared to solr (all test conditions are equal, looks like solr does not perform any version control):

My questions are:

Should't it be considered as a performance problem? Possibly there could be a solution which is based on performing loadCurrentVersionFromIndex in batches.
Given that I properly orchestrate clients and prevent concurrent updates, I suppose I could disable concurrency control for batch inserts by patching InternalEngine.java and providing some API. Am I right? What am I missing?

More info: indexing in batches with size 10 000, using TransportClient; bottleneck is CPU cycles (memory and disks are fine); GC activity is minimal;

Thanks!

javanna · September 12, 2016, 12:38pm

This is something that we've been thinking about for a while. We started working on it also:

That's all to say that things will only get better with the next versions. The latter improvement will go out with 5.0.0.

Zline · September 12, 2016, 12:53pm

Thanks for your reply, Luca.
Nice feature, I see even in 2.3.5 one could leverage autogenerated IDs to bypass version control.
Unfortunately, autogenerated IDs and append-only is not an option for me. In my case append-only stage is transient

Topic		Replies	Views
Downside to using Bulk API for small/single-doc sets? Elasticsearch	5	454	July 6, 2017
Newbie ES Questions regarding batch commits, performance, etc Elasticsearch	4	594	February 21, 2017
Disable versioning completely Elasticsearch	1	1307	July 14, 2017
Elasticsearch index performance with mostly duplicate document Elasticsearch	7	1618	November 22, 2018
Bad performance with varying bulk size Elasticsearch	8	1577	July 5, 2017

Significant overhead of concurrency control for small docs

Related topics