Indexing performance degrading over time

Dominik_Stadler · January 27, 2016, 12:30pm

Hi,

we are doing performance testing of various types of environments to decide on a properly sized set of nodes for a number of Elasticsearch Cluster installations.

When doing long-term indexing testing, we see a sharp drop of indexing performance in the beginning, stabilizing after some time. I.e. writing performance is up to 5 times higher initially compared to after 6 hours.

Typically after something like 6-8 hours performance becomes more or less stable and does not decrease a lot any more.

See the following screenshot, the blue bars are the number of documents written per time-frame, the initial performance is quite good, but quickly drops.

We mostly use Elasticsearch 2.1.1, and we see this pretty much across the board, from 1 node machines with 16GB RAM and 4 cores up to 6-node clusters with 32GB and 8 cores.

We did apply common performance optimizations mentioned in the documentation and we are using bulk indexing with 5k/5MB bulks usually.

When doing performance analysis, we see the system is mostly CPU bound, probably because the documents are large (1.5k bytes JSON aprox.) and include a few nested documents.

See this screenshot from dynaTrace: It shows the CPU usage on the left side and some Elasticsearch-specific metrics in the middle, there does not seem to be any throttling kicking in, networking is not an issue as this test even ran indexing from the same machine, read-operations are constant, writing operations go down with the number of documents that are written.

Is this something that is expected because of segment merging kicking in only after some time? However 6 hours seems a bit long when writing a few thousand documents per second.

Or do we run into some configuration-limit on indexing which throttles down the rate of indexing? Or Thread pools?

Thanks... Dominik.

polyfractal · January 27, 2016, 2:58pm

Hmm. The CPU usage is almost certainly segment merging...it's effectively a streaming sort of the segments involved in the merge (plus various compression/decompression operations). So it slurps down a lot of CPU.

It's somewhat expected for indexing throughput to decline as the shard size increases, but you have a very dramatic decrease. It looks unusual to me (especially considering the throughput earlier in the run, and that you aren't bottlenecking on disk IO).

Some random questions/observations, in no particular order:

Have you made any tweaks to the config (in particular, anything to merge policy, translog, etc)? Would you mind pasting them?
I can't quite tell, but it looks like the translog size drops significantly right as the indexing throughput decreases. Was there a configuration change? How do you have the translog configured? Are you using the default request fsync behavior?
I noticed the disk says "EBS Magnetic". How many merge threads are configured (default, or something custom)?
Are you making a lot of updates to nested documents? Or are these "append-only", where you are only adding new documents?
Do you know if there are some "outlier" documents, that might have very large number of nested children?
Do you see anything in your logs about throttling "in-flight" merges?

Dominik_Stadler · January 28, 2016, 8:12am

Thanks for the notes,

We only use a small number of settings, the bulk-threadpool change was necessary because at some point we wrote to many different indexes, but in the latest tests only one index was written to without difference.

      # Use higher sizes for thread-pools to allow to run larger bulk-indexing
      threadpool.bulk.queue_size: 1000
      threadpool.bulk.size: 10
      
      # Try to save memory with many indexes/shards
      # see https://github.com/elastic/elasticsearch/issues/8394#issuecomment-65871792 for some discussion
      index.load_fixed_bitset_filters_eagerly: false
      
      # Use a higher wait-time that the master waits until it re-balances shards/replicas
      # See https://www.elastic.co/guide/en/elasticsearch/reference/current/delayed-allocation.html
      index.unassigned.node_left.delayed_timeout: "15m"

Apart from that everything is using default values.

Index settings are

      "number_of_replicas": "1",
      "number_of_shards": "6",
      "refresh_interval": "30s"

No change to translog settings as far as I see
It should be primarily newly written documents, only very few duplicates, no real updates. The number of nested documents is fairly constant between 1 and 3 items per main document as this is generated synthetic data
Logs are completely empty

Any suggestions of additional things we can try?

Christian_Dahlqvist · January 28, 2016, 8:19am

Are you specifying document IDs at indexing time or letting Elasticsearch generate them automatically?

rusty · January 28, 2016, 8:46am

Hi, we have quite similar problem. ES 2.1.0 indexing slows down after 2.5-3 days (daily indices, append only, self-generated doc FLAKE IDs). Sorry, I cannot provide detailed information right now (still collecting logs and metrics to share). No CPU or IO pressure, in logs mostly merges (normal and throttled). But bulk execution time increases (indexing buffers goes down and keeps low) over time. The only way to get speed back is full cluster restart.

Do you have any anomaly with thread pool and load average on multiple nodes?
watch -d -n 5 "curl -s localhost:9200/_cat/thread_pool | sort"

Can you try to restart the cluster and see if indexing speed increases after it's become green again?

Dominik_Stadler · January 28, 2016, 9:39am

We have generated IDs from outside of Elasticsearch, probably UUIDs, plus routing on another id which we use to store related documents in the same shard.

We will do some runs with auto-generated IDs to see if it is related here.

Christian_Dahlqvist · January 28, 2016, 9:46am

If you are generating IDs at the application layer, the structure of your ID can impact performance as it may make it more expensive for elasticsearch to check if the ID already exists and need to be updated, especially if you have slow disks. If this is the case, the indexing throughput tend to drop as the shards grow in size. This blog post describes how this works under the hood.

polyfractal · January 28, 2016, 3:51pm

Yep, IDs can play a role in it. So that could be part of the slowdown, but I don't know if it would account for the entire effect.

A potential contributor might be this bug: When shard becomes active again, immediately increase its indexing buffer by mikemccand · Pull Request #13918 · elastic/elasticsearch · GitHub

Is your data being ingested continuously, or could it be "bursty" enough for a shard to become "inactive" (~~default 30s~~ edit: default 5min for a shard to become inactive after the last document index/update/delete)? If the shard falls inactive it might be affected by that bug.

Otherwise, not entirely sure, talking to some other devs. Thanks to @Dominik_Stadler and @rusty for the details/stats/info, it makes it a lot easier to kick around ideas internally

You might try setting index.merge.scheduler.max_thread_count: 1 in your config (or update through settings api), since you are on magnetic spinning disks. Multiple merge threads are good for SSD, but tends to slow down spinning.

ES should autodetect and default to 1 for spinning (you'll see in your startup log a message about spinning true/false). I don't think this is the ultimate problem, but it might be worth a shot.

Could you also post the output of your Hot Threads API? Perhaps something interesting will show up there (although I'm guessing it will just show merge threads).

Will keep digging

I'm thinking this is somehow related to #13198 too, since you mentioned the indexing buffer goes low and stays that way. @Dominik_Stadler, do you notice your indexing buffer staying low as well?

polyfractal · January 28, 2016, 4:20pm

It was also suggested to me by another dev that you might be running into EBS IOPs throttling? Do you know if you are saturating your IOPS?

Not an EC2 expert however, so that's about the extent of my knowledge there

ivank · January 28, 2016, 4:21pm

In addition to @rusty's post, running command watch ... shows that only one instance of Elasticsearch has large queue of bulks, whereas remaning tens instances of Elasticsearch does nothing. We use two masters to send them bulks, and we have tried to use client.transport.sniff: true to send bulks in round-robin maner. But it helped very little. Still we have one oveloaded node and decreasing indexing performance. On every cluster restart overloaded node is changing.

We tested that id generator gives unique ids on our docs. So I don't think the problem is with id...

polyfractal · January 28, 2016, 6:22pm

@ivank do you use custom routing? It's not uncommon for custom routing to distribute data in a lumpy manner because your partitioning key is not evenly distributed, or potentially your custom routing is just incorrect and everything is routed to the same shard.

Can happen with parent/child relationships too, since all the children are routed to a single shard with the parent.

ivank · January 28, 2016, 6:34pm

@polyfractal no, we don't use custom routing, just indexing docs via bulks, only ids are generated on client side, and they are unique it seems...

As I understand childs are nested object (in JSON terms)? Documents are mainly plain and we don't use any explicit relationships.

polyfractal · January 28, 2016, 8:08pm

@ivank How many shards are in this index? Is it a single primary shard?

Is this related to the indexing slowdown-over-time that @rusty mentioned, or a different symptom/problem? (Are you two working together or is this a third cluster with slowdown-over-time problem?)

I just don't want to derail this thread before we figure out that issue. If it's separate, let's open a new thread and discuss there.

ivank · January 28, 2016, 8:29pm

@polyfractal, we are working together with @rusty

We have different kind of indices, some of them have couple of shards, some are big and so sized to count of nodes in cluster. All they have 1 replica

Dominik_Stadler · February 2, 2016, 7:42am

I did a few more tests, for me it does not happen if I have no nested documents instead of 1 to 3 that I had in my original set of tests. Now the indexing rate (the blue bars in the top row) drop much less and stay constant, the gap seems to be some network issue at that time.

Could it be that nested documents require more and more effort over time?

Anything to test or investigate? I can also add code-level instrumentation via Dynatrace to see where more time is spent, which code-places in Elasticsearch would be interesting here?

mikemccand · February 2, 2016, 10:36am

That's odd that you only see the slow down when indexing with nested documents.

Nested documents do require more work to index and merge, since under-the-hood, each of your 1 to 3 nested docs (plus the parent doc) are indexed as separate documents to Lucene, but that work should not be increasing over time.

Can you pull a full hot threads (pass e.g. threads=10000 to https://www.elastic.co/guide/en/elasticsearch/reference/2.1/cluster-nodes-hot-threads.html) from all nodes when things have gotten very slow with the nested documents?

ivank · February 4, 2016, 4:27pm

So, we have collected some more info, and I want to post it here.

At the moment, we have decreasing performance by ~10 hours instead of ~3 days (because of more "right" cluster restart which doen't make elasticsearch to rebalance indices).

Here is manually combined prictures of charts. They show stats of non-overloaded node (other have similar stats).

One difference with overloaded node is in CPU utilization, of cource, because of large bulk queue.

Timeline is saved (approx.)

We have also built animation of how bulks are distributed over nodes using given watch ... command (sort is used, so nodes is not always at the same positions):
~70M docs/5min:

~40M docs/5min:

~10M docs/5min:

hot threads for 40M: http://pastebin.com/9RHdJic6
hot_threads for 10M: http://pastebin.com/Hd4ZdQF1 (smaller, of course)
stats for 10M: http://pastebin.com/WNnB5iDN
thread_pool for 10M: http://pastebin.com/k3mEtRb0 (page is heavy...)

What can we share anything else?

mikemccand · February 5, 2016, 9:04am

Thank you for posting the hot threads ... I see most hot threads doing normal bulk indexing, which is good, however many are doing ID lookups to locate the previously indexed document by that ID. This usually means too many segments (though your aggregate stats seem OK: ~64 segments per shard) or that the OS doesn't have enough free RAM for IO caching (do you leave 50% of the machine's RAM for it?).

Can you try the suggestions at https://www.elastic.co/blog/performance-indexing-2-0?

In particular, in your "hot threads for 40M" output, I can see merges are being throttled, which can cause too many segments, which will slow down ongoing indexing, and can cause index throttling.

When @rusty said " (indexing buffers goes down and keeps low)", how/where did you see that?

Are you leaving 50% of each data node's RAM to the OS for IO caching?

Are you using spinning disks or SSDs? What ES settings have you changed from defaults?

ivank · February 6, 2016, 7:40pm

Not so 64...
Indices differ, of course. There is some indices and segment count:

index1-2016.02.02: 5602
index1-2016.02.03: 7796
index1-2016.02.04: 2229
index2-2016.02.02: 1791
index2-2016.02.03: 1918
index2-2016.02.04: 1928
index3-2016.02.02: 15
index3-2016.02.03: 11
index3-2016.02.04: 18
index4-2016.02.02: 41
index4-2016.02.03: 42
index4-2016.02.04: 49
index5-2016.02.02: 15
index5-2016.02.03: 10
index5-2016.02.04: 16
...

About RAM: 61GB of 128GB is allocated for 2 ES instances, 4 GB is for our custom app that loads data (really can use up to 30GB) and shares CPU and OS cache:

free -m
             total       used       free     shared    buffers     cached
Mem:        129010     127494       1515          1        456      25638
-/+ buffers/cache:     101399      27610
Swap:        16383        862      15521

From the provided link:

We turned off throttling for store - is it good?..
Yes, we use multiple path.data
We use own ID because of the same transient log in our custom app (got same
code for ID generation as ES uses)
We don't use doc_values because of spinnig disks and we don’t need aggregation or sorting on ES side.

So, we turned off throttling for storing docs, is this affects merging?
Here we use settings in templates for indices:

        /* for the largest indices it’s set to 24 (by doc count <=2B and
        shard size <=50GB, balanced to the ES nodes number) */
        "index.number_of_shards": ..., 
        "index.index_concurrency": 32,
        "index.number_of_replicas": 1,
        "index.refresh_interval": "30s",
        "index.codec.bloom.load": false,
        "index.merge.policy.use_compound_file": true,
        "index.merge.policy.floor_segment": "50mb",
        "index.merge.policy.max_merged_segment": "500mb",
        "index.merge.policy.reclaim_deletes_weight": 0.0,
        "index.merge.async_interval": "10s",
        "index.merge.scheduler.type": "concurrent",
        "index.merge.scheduler.max_thread_count": 2,
        "index.translog.flush_threshold_size": "500mb",
        "index.ttl.disable_purge": true,
        "index.codec": "best_compression",
        "index.store.type": "niofs",
        "index.store.throttle.type": "none",
        "index.warmer.enabled": false,
        "index.mapper.dynamic": false

the largest indices have 22 & 11 shards, others are small and have 1-2 shards.

And settings for Elasticsearch itself:

...
names, paths here... nothing interesting
...
bootstrap.mlockall: true
network.host: 0.0.0.0
http.port: 9200-9400 ports range as usual
transport.tcp.port: 9300-9400 ports range
gateway.recover_after_nodes: 1
discovery.zen.ping.unicast.hosts: ["master_01:9300", "master_02:9300"]
node.master: true for 2 masters
node.data: true for 20 datas
http.cors.enabled: true
http.cors.allow-origin: "*"
node.host: some host
cluster.routing.allocation.awareness.attributes: host
cluster.routing.allocation.same_shard.host:  true
os.processors: 8
threadpool.bulk.size: 30
threadpool.bulk.queue_size: 3000
threadpool.search.size: 30
threadpool.search.queue_size: -1

By indexing buffers he meant indices.segments.index_writer_memory_in_bytes taken from _nodes/stats (third chart on Docs vs CPU.png). But I believe that this decreasing is the consequences.

mikemccand · February 7, 2016, 11:12am

Hmm some of these settings are not good. Can you revert to defaults and re-test?

E.g.

"index.merge.policy.max_merged_segment": "500mb"

is dangerous (it's 10X smaller than the default) since it can cause too many segments for large shards.

"index.store.type": "niofs"

is very dangerous since this Directory impl. is poor for random access IO needed for the ID lookups.

threadpool.bulk.size: 30
"index.index_concurrency": 32

You should not set these two beyond the number of true cores your hardware has: doing so just hurts indexing throughput and stresses IO system (newer versions of ES now enforce this: Limit the max size of bulk and index thread pools to bounded number of processors by mikemccand · Pull Request #15585 · elastic/elasticsearch · GitHub). If anything, set it a bit lower than your true core count ...

"index.codec": "best_compression"

This is maybe the wrong tradeoff, since it means more CPU at indexing (and merging) time for smaller disk usage.

"index.merge.policy.reclaim_deletes_weight": 0.0

That's bad to do if you are doing deletes and you care about search performance, since it means the index can accumulate too high %tg deleted-but-not-reclaimed docs.

Some of your other settings have been removed from ES recently.

It's best to set shard count for an index only as large as is really needed to 1) ensure you never hit the 2.1B per-shard doc limit, and 2) get enough search time concurrency for your search QPS needs. Setting it higher just adds risk of indexing slowdown when some nodes in your cluster are overloaded...

Finally, your output from free seems to show that only ~27 GB is available as buffer cache? Somehow your processes are using up the other 101 GB of RAM? You should try to leave ~50% or more for the OS to cache the hot blocks from the index.

Topic		Replies	Views
Indexing performance degradation over time Elasticsearch	19	3235	December 19, 2017
Rapidly Degrading Bulk Indexing Performance Elasticsearch	7	368	July 6, 2017
Elasticsearch Indexing Performance Degradation Elasticsearch	6	190	April 22, 2024
Indexing performance Elasticsearch	6	367	July 6, 2017
Performance degradation with index size ~2Gb, ~6M docs Elasticsearch	2	520	July 6, 2017

Indexing performance degrading over time

Related topics