Speeding up indexing in ES 2.2.0

mkelkar · March 28, 2016, 6:44pm

Hi All,
I am trying to test performance of service with ES 2.2.0 and I am seeing slowness in rate of indexing. We use SSDs for all of our data nodes. What I am seeing is that if I test with the same setup with ES 1.7.3, we can index documents on avg 70K/sec, but with ES 2.2.0, the rate drops to 40K/sec. What I am looking for is advice on what things I can start to look at to get better indexing performance. Here is a description of our test setup, which is identical for both 1.7.3 and 2.2.0

service nodes 3 m2.4xls
ES master 3 nodes m3.2xlarge
ES Data 9 c3.8xls

We are using the default merge policy which comes with ES 2.2.0. For 1.7.3, we have set indices.store.throttle.type = none to speed up indexing. This setting (and various others) have been removed by ES 2.X.

Any clues on what I can start looking into?

Thanks,
Madhav.

tinle · March 28, 2016, 7:01pm

The default translog flush policy in ES 2.x changed to synchronous. So there is a performance hit.

If you want to use same policy as ES 1.X, set this in your ES config

# riskier, but faster (same as ES pre-2.x; default in 2.x is 'request')
index.translog.durability: async

mkelkar · March 28, 2016, 8:35pm

Thanks Tinle for the pointer!

I just tried it, and it has increased indexing speed by a little, but not by much...its still around 42K/sec mark..is there something else I can check?

Thanks,
Madhav.

nik9000 · March 28, 2016, 8:32pm

You should be able to use a larger bulk size to get performance to the
point where the default is only a little slower than async. So, yeah, my
suggestion is to try larger bulk sizes in 2.x and to make sure your refresh
interval is high-ish, like 30 seconds.

rusty · March 28, 2016, 9:34pm

This setting was moved to index template, so you can use it there.

"settings": {
  "index": {
    "store": {
      "throttle": {
          "type": "none"
        }
      }
   }
}

You can also try to disable doc_values for non analyzed fields (if you don't really need this). This can also improve indexing rate and minimize disk usage.

mkelkar · March 28, 2016, 10:36pm

Thanks Nik & Rusty!

I will try these out and see what happens.

@Rusty - is this new setting dynamic? I do not see it documented anywhere...

Thanks,
Madhav.

rusty · March 29, 2016, 6:58am

My bad, this setting removed from ES 2.2

jprante · March 29, 2016, 7:16am

70k/sec - do you mean 70k docs per second, or 70 kilobytes per second?

JoarSvensson · March 29, 2016, 11:06am

Like @rusty I would recommend trying to disable doc_values for fields where applicable, to see if you experience a performance increase. More on doc_values: http://stackoverflow.com/questions/32332487/what-are-the-disadvantages-of-elasticsearch-doc-values

mkelkar · March 29, 2016, 2:18pm

@jprante its 70k docs per second.

@JoarSvensson we have already disabled doc_values where ever we could.

I am testing out increased batch sizes as we speak, I will post the results soon enough.

Thanks,
Madhav.

JoarSvensson · April 4, 2016, 10:59am

@mkelkar Any luck with increased batch sizes?

mkelkar · April 5, 2016, 10:57pm

@JoarSvensson - I just finished doing a couple of large scale tests minutes ago...

Increased batch size for bulk-index-request to 5k instead of earlier 1k - this was total failure. ES started throwing out a bunch of exceptions, all of which had this - 'NotSerializableExceptionWrapper[Failed to acknowledge mapping update within [30s]'
I decreased batch size to 4k - similar story..ES did not throw out exceptions, but the bulk write rate was about 20k which was way lower as compared 40K to using batch size 1k ...

Also, I tried to disable auto throttling on merges(index. and setting max_thread_count to 1 which did not help either...

Its worth mentioning that our process runs in two phases , first phase is indexing heavy, and second phase is query heavy. The way it works is it

first indexes all documents in ES for a client ( phase1) .
as soon as all docs are written for a client, then it moves to Phase2 where the docs are updated according to our business rules and are indexed again..

What I have seen with 2.2.0 is that while phase1 is running for some clients, phase2 begins processing way faster than 1.7.3. I hunch is that in 1.7.3 because we set indices.throttle.type = none, indexing is topmost priority so we finish phase1 lot quicker, and Phase2 processing is automatically slowed down while phase1 is running...But for 2.2.0, Phase2 is way faster for some reason ( most probably because indexing is not the topmost priority anymore, and our queries are faster because of ES 2.2.0 optimizations ) .....

I would like to do something similar to 1.7.3 where we used to assign top most priority to indexing, but I am not aware of any settings which would do that in ES 2.2.0...I have already tried (index.merge.scheduler.auto_throttle = false) and (index.merge.scheduler.max_thread_count=1) with no luck....any clues on how to proceed further?

Thanks,
Madhav.

JoarSvensson · April 6, 2016, 5:41am

Interesting case indeed. I haven't come across settings to alter priority as of yet anyway. It feels like logic part of the internals of ES.

The two thing on top of my mind is if you are either able to perform just one indexing and do the phase2 logic in one combined step prior to indexing. Or if you could add more nodes to help offload the cluster, or even have to separate clusters if possible to be sure you're not doing heavy indexing and querying at the same time.

Hopefully someone else has better ideas.

mkelkar · April 6, 2016, 2:17pm

We cannot perform phase2 logic within phase1 because it depends on all documents being written in ES first..we denormalize documents in Phase2, so if not all documents are available then our results would be incorrect.

We already have a pair of clusters because we did see this conflict between indexing vs querying at the same time. We use one cluster just to transform our docs, and the second one just serves query traffic. However, because we have to do our doc processing in Phases, we have the same conflict on our doc processing cluster with 2.2.0...with 1.X we do not have any problems whatsoever....

I was planning to throttle our phase2 processing until Phase1 completes, but I wanted to see if there are better ideas for increasing priority for indexing....

Thanks,
Madhav.

jprante · April 6, 2016, 9:02pm

ES2 limits resources for a) bulk and b) segment merging (threads). There are good reasons for that, and there are no knobs to turn to enable something like "priority to indexing", because in ES1, you could only allocate more threads for bulk or segment merge than JVM can handle, which had serious side effects.

I have a similar workload case like yours. You can avoid extra load when you create a new index in phase 2. Indexing new documents into an existing index (replacing existing docs) is more expensive than writing into an empty index. After that, switch an index alias form old to new index.

mkelkar · April 8, 2016, 5:51pm

@jprante During phase2 I have no problems, its Phase1 that cause problems...

Also, If I switch to bulk size of 500 docs / sec, I get a flood of these exceptions in the logs -

java.util.concurrent.TimeoutException: Failed to acknowledge mapping update within [30s]
at org.elasticsearch.cluster.action.index.MappingUpdatedAction.updateMappingOnMasterSynchronously(MappingUpdatedAction.java:122)
at org.elasticsearch.cluster.action.index.MappingUpdatedAction.updateMappingOnMasterSynchronously(MappingUpdatedAction.java:112)
at org.elasticsearch.action.index.TransportIndexAction.executeIndexRequestOnPrimary(TransportIndexAction.java:228)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation(TransportShardBulkAction.java:326)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:119)
at org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:68)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryPhase.doRun(TransportReplicationAction.java:595)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:263)
at org.elasticsearch.action.support.replication.TransportReplicationAction$PrimaryOperationTransportHandler.messageReceived(TransportReplicationAction.java:260)
at org.elasticsearch.transport.TransportService$4.doRun(TransportService.java:350)
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

looks like it takes more than 30 seconds to ack the mapping update request...what should I start looking at changing?

Thanks,
Madhav.

mkelkar · May 4, 2016, 5:31pm

FWIW, here is what I found -

We used to have an ES node client embedded in our service JVM. During bulk indexing, our service was doing lot of work, which caused the embedded node client not to respond to cluster state updates...switching to transport client fixed the problem...

Bruce_Ritchie · May 4, 2016, 5:47pm

The docs say that index.store.throttle.type setting was removed, but I still see it in IndexStore class and in https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing-performance.html documentation.

Topic		Replies	Views
Elasticsearch upgrade from 1.7.1 to 2.3.2 then create index very slow Elasticsearch	36	4574	July 5, 2017
Slow bulk indexing Elasticsearch	4	2097	July 5, 2017
Elasticsearch Benchmarking Results Intepretation Elasticsearch	12	1763	July 5, 2017
Slow Indexing Speed Elasticsearch	5	7231	July 6, 2017
ES indexing rate varies horribly Elasticsearch	20	4772	July 5, 2017

Speeding up indexing in ES 2.2.0

Related topics