ES5 Indexing performance (seems slow)


When indexing 200000 documents in batches of 1000 using the bulk REST API I am seeing the following log messages:

[2016-11-03T21:30:55,263][INFO ][o.e.i.IndexingMemoryController] [node-1] now throttling indexing for shard [[myindex][1]]: segment writing can't keep up
[2016-11-03T21:30:57,708][INFO ][o.e.i.IndexingMemoryController] [node-1] now throttling indexing for shard [[myindex][2]]: segment writing can't keep up
[2016-11-03T21:31:01,213][INFO ][o.e.i.IndexingMemoryController] [node-1] now throttling indexing for shard [[myindex][3]]: segment writing can't keep up
[2016-11-03T21:31:05,020][INFO ][o.e.i.IndexingMemoryController] [node-1] now throttling indexing for shard [[myindex][4]]: segment writing can't keep up
[2016-11-03T21:31:10,675][INFO ][o.e.i.IndexingMemoryController] [node-1] now throttling indexing for shard [[myindex][0]]: segment writing can't keep up

This is an out of the box ES5 node. I have defined two indexes each with a single type. The mapping for each type is nothing complicated (about 18 fields per document 16 of which are strings the other 2 are longs). I am not storing _source.

Any suggestions for configuration changes that I can make (I am running on a Mac)?



So I modified the JVM options from:




And the throttle message has stopped but I now get this instead:

[2016-11-05T19:41:27,922][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][465] overhead, spent [443ms] collecting in the last [1s]
[2016-11-05T19:41:28,922][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][466] overhead, spent [316ms] collecting in the last [1s]
[2016-11-05T19:41:30,929][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][468] overhead, spent [310ms] collecting in the last [1s]
[2016-11-05T19:41:33,033][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][470] overhead, spent [312ms] collecting in the last [1.1s]
[2016-11-05T19:41:35,176][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][472] overhead, spent [343ms] collecting in the last [1.1s]
[2016-11-05T19:41:36,179][INFO ][o.e.m.j.JvmGcMonitorService] [node-1] [gc][473]

Any suggestions on how to resolve this?



1 Like

It's garbage collection, it's normal.

1 Like

I thought it was perhaps excessive...oh well.

I guess I will need to look for other reasons as to why the performance is 5 times slower than sphinx.

You haven't really given much to go on here, so it's very hard to say.

Well I'm not the expert...

Maybe some pointers on how to increase the performance or things to look into? I freely admit it could be related to the adapter I wrote (although it is simply pulling information from SQL and pushing to Elastic using the bulk rest API. 1000 per second).

Sphinx index time:

real 1m7.001s
user 0m43.332s
sys 0m3.135s

Elastic index time:

real 4m16.630s
user 0m40.943s
sys 0m11.706s

What version are you on?
What do the docs look like, what about the mapping?

What version are you on?

5.0 GA

What do the docs look like, what about the mapping?

Anonymized to protect the innocent...




Example documents:


"field1": 5,
"field2": "a sentence of text",
"field3": "aWord",
"field4": "aWord",
"field5": "aWord",
"field6": "aWord",
"field7": "aWord",
"field8": "aWord",
"field9": "aWord",
"field10": "aWord",
"field11" : {
"fieldX1: "aWord",
"fieldX2: {
"fieldX3" : "aWord"
"field12": 1476313771,
"field13": 1476313786,
"field14" : ["aWord", "aWord"],
"field15" : "aWord",
"field16" : "aWord",
"field17" : "aWord",


"field1" : 1455186369,
"field2" : [ "aWord", "aWord" ],
"field3" : "aWord",
"field4" : [ "aWord" ],
"field5" : 10073,
"field6" : "aWord",
"field7" : "aWord",
"field8" : 1455186368,
"field9" : [ "aWord", "aWord" ],
"field10" : [ "aWord" ],
"field11" : 10073,
"field12" : "aWord",
"field13" : "aWord"

The mappings might not exactly match the documents as I manually renamed fields but hopefully you get the idea.

The timings I gave you above were for 1.3 million documents. If I run all the logic apart from the following line it takes 1 min 50 seconds (So about 2 mins+ more when the bulk api is called):

// For 1000 documents (_bulk)
// Response response = restClient.performRequest(
// PUT,
// path,
// new Hashtable<>(),
// requestBody);

I should add than my indexing program is single threaded.

For optimal indexing performance you should probably increase this a bit. Don't start with a very large value, but rather gradually increase it in small steps until you see no further improvement in throughput. When I benchmark, I often do not need a very large number of threads for optimal indexing throughput. You may also want to review this guide to indexing performance if you have not already.

Thanks Christian, I appreciate your suggestions (& link). I will investigate further.

This is excessive garbage collection, the node is spending more than 30% of its time collecting. That's why we have this warning.

1 Like

But it's info level?

Default debug at 10%, info at 25%, and warn at 50%.



Just to give an update. I attempted to split the indexing into two threads (one per document type) rather then first indexing one document type then the other. This leads to a timeout exception (On my Mac Book Pro the ElasticSearch Java CPU hovers around 250% during the indexing): listener timeout after waiting for [10000] ms
	at org.elasticsearch.client.RestClient$SyncResponseListener.get(
	at org.elasticsearch.client.RestClient.performRequest(
	at org.elasticsearch.client.RestClient.performRequest(

I attempted to increase the timeout:

restClient = RestClient.builder(hosts)

But that leads to a socket timeout:
	at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(
	at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(
	at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(
	at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(
	at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(
	at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(
	at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(
	at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(
	at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$

Before attempting to split the indexer into more threads I will the indexing performance tips link provided by Christian to see if that helps.

While the indexing is occurring I see many (about 47 before the exception) INFO log lines regarding the GC overhead.



I'm not sure how to improve the indexing performance I am seeing. I have tried different batch sizes, different numbers of threads, different configurations ( of none and refresh_internval of -1) and different JVM options (including messing with the GC). The performance I am seeing on my Mac (depending on the batch size) is between 7000 to 12000 documents per second (each document is about 500 bytes).

I have read of people achieving much higher index rates so I'm not sure what is the problem.

Unsure if I had messed something up with my indexer code I instead used logstash on one of the JDBC tables (so there is not a line of code written by me). The performance was worse (I am assuming this is because it is not using the batch API).

Are you running Monitoring? You can do this with a free Basic license:

Performance and indexing throughout depends a lot on the hardware used as well as the size and nature of the data indexed. Comparing indexed events per second to someone running Elasticsearch on a production server is not necessarily very useful. When you are indexing, what does CPU and IO load look like? Have you identified what is limiting performance?

I'm currently spending less time investigating this as it might be that we can tolerate the performance (not sure yet). Anyway, I have tried a number of different configurations on the client side (more threads, big batch sizes etc...) and the best performance I can get is a single thread.

While running 10 threads and a batch size of 2000 the indexing will consistently time out after 10 seconds (one of the threads times out and then I cancel all the other ones). What I see is that gradually the total processing time (the value provided in the bulk response) slowly creeps up towards 10s. When it gets close to 10s the indexing threads start timing out. While the response times are increasing the GC overhead is also increasing.

The output from running iostat -w 3 while this was going on :

          disk0               disk2       cpu    load average
KB/t  tps  MB/s     KB/t  tps  MB/s  us sy id   1m   5m   15m

206.84 175 35.33 0.00 0 0.00 87 10 3 8.17 5.32 5.27
161.92 138 21.80 0.00 0 0.00 90 7 3 8.17 5.32 5.27
176.33 139 23.95 0.00 0 0.00 91 8 1 7.83 5.30 5.26
82.98 113 9.13 0.00 0 0.00 91 6 2 7.83 5.30 5.26
183.03 134 23.91 0.00 0 0.00 90 7 3 8.09 5.39 5.30
57.79 177 9.97 0.00 0 0.00 88 10 1 7.92 5.40 5.30
160.84 157 24.71 0.00 0 0.00 89 8 3 7.92 5.40 5.30
57.69 188 10.57 0.00 0 0.00 90 6 4 7.68 5.40 5.30
104.62 110 11.19 0.00 0 0.00 92 5 4 7.68 5.40 5.30
260.84 96 24.50 0.00 0 0.00 92 6 2 8.75 5.66 5.39
196.15 151 28.96 0.00 0 0.00 90 6 4 8.85 5.73 5.42
210.46 129 26.48 0.00 0 0.00 92 6 2 8.85 5.73 5.42
207.70 93 18.79 0.00 0 0.00 91 5 3 8.62 5.73 5.42
194.61 135 25.65 0.00 0 0.00 88 8 4 8.62 5.73 5.42
96.09 129 12.12 0.00 0 0.00 93 5 2 8.97 5.85 5.46
68.68 114 7.64 0.00 0 0.00 89 9 2 9.45 6.00 5.52
92.53 88 7.94 0.00 0 0.00 94 4 2 9.45 6.00 5.52
102.65 230 23.05 0.00 0 0.00 90 8 2 8.78 5.92 5.49
200.96 171 33.60 0.00 0 0.00 89 8 3 8.78 5.92 5.49
184.09 162 29.15 0.00 0 0.00 91 6 3 8.55 5.92 5.50

It timed out just at the end when I stopped the monitoring.

This was the output of the elastic x-pack monitor (it timed out around the highpoint of the last peak in system load) :

I have tried running with different heap sizes (2G, 5G, 8G) but the results are similar.

My Mac has 16G of memory and everything is running on the same box.

Does anyone know if there is a recommended configuration for running on a typical Mac book pro? Maybe for the test I am running ElasticSearch needs to be on its own machine. No timeouts occur if I limit the batch size to 1000 in a single thread but if I want to try and match the performance of Sphinx I am having trouble to do that on this machine.



So ,have you solved this ? I face the same problems when upgrading from ES2.1 to ES5.0 ,and the bulk indexing operation performs lower than ES 2.1 ,and always OOM

Sorry but I was unable to improve the performance for the tests I was running. I have only used ES5 so I cannot comment on if the performance has gotten worse. I am no longer actively investigating this.