Upgrade to elasticsearch 2.3.3 from 2.3.1 - Slow Indexing


#1

I've begun rolling upgrades from 2.3.1 to 2.3.3 in a 15 node cluster (11 data, 1 client, 3 master). So far everything is upgraded except 8 data nodes.

During the day, i upgrade a node and everything is rebalanced and green before the evening, then at night into the morning, about 40-50 million documents are indexed via logstash and about 10 million indexed via elasticsearch-py bulk api.

Since i've started the upgrades, the elasticsearch-py bulk operation times out even when request_timeout set to 60 and only indexing less than 50 documents!

Is there any particular reason for this huge slowness? Could it be because the cluster has some nodes with 2.3.3 and some with 2.3.1 ?

FYI, each index has 5 shards + 1 replicate.

Please let me know if additional information is needed.


(Glen Smith) #2

Could it be because the cluster has some nodes with 2.3.3 and some with 2.3.1 ?

Yes.

During the day, i upgrade a node and everything is rebalanced and green before the evening

Complete the rolling upgrade [1] in one sequence.

[1] https://www.elastic.co/guide/en/elasticsearch/reference/current/rolling-upgrades.html


#3

Thanks Glen.. After completing the upgrades, i get less indexing timeouts but am getting slightly less query timeouts.

It also seems the request durations have slowed down considerably. For example an aggregation request in kibana went from about 5sec to now 25-40 seconds.

Has anyone encountered slowness after upgrading to 2.3.3?

Right now I have 11 data nodes (10 750GB and 1 1000GB) all with 64GB memory. I allocated 30GB heap to the data nodes,
3 master nodes, 24G memory each. 1 client node 30G memory.
All the nodes are virtual machines with spinning disks.

I have 4638 indices, 5 primary shards with 1 replicate each for a total of 6TB (inlcuding replicate).
The indices are date and 'category' named. Essentially, I have about 42 indices per day (varying sizes ranging from 3MB to 20GB).

Im still learning about optimal configurations so if my setup isn't optimal, I'd appreciate any help I can get.

Thanks!


(Glen Smith) #4

I have 4638 indices, 5 primary shards with 1 replicate each

So 4200 shards per data node. You need to reduce your shard count. By 1 - 2 orders of magnitude.

I'm assuming you don't have indices that each contain the same document type. If you do, merge them.

I have about 42 indices per day (varying sizes ranging from 3MB to 20GB).

First thing, you don't have a single index that needs 5 primary shards. Reconfigure so you have 1 primary
shard/index.A 20GB shard is no problem (particularly when the alternative is to have way too many shards.)
In a few months, you'll be down to 840 shards per data node, much faster if you use the _reindex API. Still too many.

A 3MB daily index, if it's the same index that is 3MB every day - you could easily accommodate 20 years of that in a single shard. But lets settle for a month.

Let's work out a 3-tier strategy. Monthly, Weekly, Daily.

Identify the indices that accumulate less than, say, 1GB/month. Configure those to be monthly indices. For these, the choice to use _reindex is a no brainer, it won't be very taxing. 1 shard, 1 replica.

Identify indices that accumulate 1GB - 5GB a month. Configure those to be weekly. 1 shard, 1 replica. _reindex.

Leave the rest as daily, but still, 1 shard, 1 replica.

Now. All those replicas. Are you taking snapshots? If you aren't, start. Then you can start to consider dropping the replica on your old, read-only indices. With that many shards per node, you aren't gaining any read throughput by having replicas, you just need to reduce your risk of losing all copies of a shard, and 2 is better than 1, but snapshots will alleviate that concern.

Implementing these changes will make a spectacular difference in how your cluster performs, both indexing and searching.


#5

This is probably the most concrete info I've read so far. I should have posted much sooner!

I'll implement your suggestions and revert back. Thanks!


#6

Currently down to about 3000 shards per node but still in progress. The monthly and weekly indices set to one shard seem to be much faster but not so sure about the daily indices. I tested one of the larger indices, details below:

I tested changing one daily index from 5 primary shards to one (both have 1 replicate and configured for best compression).
INDEX A - One primary shard - 23.7 million documents - 12.8GB
INDEX B - Five primary shards - 22.7 million documents - 15.6GB

These are durations from querying in Kibana Discover page all documents in the index.
INDEX A (One primary)

  • Query Duration: 4705ms
  • Request Duration: 6472ms

INDEX B (Five primary)

  • Query Duration: 1937ms
  • Request Duration: 4788ms

Does this prove that 10+GB shards are not more performant? Or do you think something else could be an issue?


(system) #7

This topic was automatically closed after 21 days. New replies are no longer allowed.