Indexing to one shard is not concurrency-friendly?

Hello,

I tried to concurrently index lots of small documents to index with 1 shard and 0 replicas. And I saw that changing number of threads from 1 to 3 gives performance gain only about 1.5x while I expected at least 2.5x. CPU usage of elasticsearch process increased 3x times (contention?). Is this situation normal? Is there a way to improve shard concurrency?

I'm using elasticsearch-2.3.5 with default settings on stable debian Linux 3.16.0-4-amd64.

$ java -version
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)

Index-level settings:
codec: best_compression
merge.scheduler.max_thread_count: 1 (tried 3 - without significant changes)

My app is multithreaded java, each indexer thread using its own TransportClient, performing bulk index requests with size of bulk request = 10 000. App is not a bottleneck: its CPU usage is about 35%.
I/O is not a bottleneck: according to iostat most of the time my SSD usage is about 10%.
Memory is not a problem: 3+ Gb of cached mem, swap is disabled. Elasticsearch process virt: 8.5g, res: 1.8g.

I know that sharding will help, but I'm also tryin to maximise per-shard performance (e.g. for transient bulk-load cases).

Index stat after indexing using 3 threads:
health status index pri rep docs.count docs.deleted store.size pri.store.size
green open main 1 0 26849945 0 5.4gb 5.4gb

Performance comparison of 1-thread and 3-thread cases attached (again, expecting >2.5). X axis is seconds.

Thanks!

Hey,

first, Elasticsearch 2.4 might speed up your indexing, see the release notes.

second, you could use the hot threads API to find out what your CPU is doing.

Which number of threads did you change from one to three? Was it the merging threads or others? Because at the top you wrote about 1.5x changes and later no significant changes...

Did you monitor GC and thread pools? Any high number of GCs or rejected tasks? Did you try with a smaller bulk number and see if things changed?

Many knobs to turn and try...

--Alex

Thanks for your reply, Alex!

I increased from 1 to 3 client app threads - threads which send bulk index requests via TransportClient.

Tried ES 2.4, with the same result. Changing batch size from 10 000 to 1000 also did't help.

Using hot threads API gives the same result as sampling via jvisualvm: looks like version control eats lots of cpu cycles (I'll open separate topic regarding versioning overhead a bit later):

GC acitivity is minor:

All index requests completed successfully, so I suppose there were no rejected tasks.
As of thread pools, which parameters shoud I monitor? Number of live threads is stable, around 100.

For what it is worth 5.0 has an enhancement where it skips this step entirely if you are doing an index and let Elasticsearch assign the IDs. If you provide the IDs it still has to do this so it can remove the old document from the index. The optimization has to be skipped in some funny cases like replaying requests from the translog and if a replica botches its first attempt at doing the index.

I can't use auto-generated IDs, but I suppose I could skip version check during initial indexing if I guarantee that where won't be concurrent updates. For more details please see this thread.

Back on topic, I think concurrency problems are caused by contention inside version control. At least partially.