Heavy load on one node (1 index)

We have a 4 node cluster with 1 index, 25m docs or so. We have many
processes reading/writing to our cluster through
pyes which we've verified randomly chooses an initial node to talk to.

Strange thing is, 3 of the 4 nodes are bored (0.25 load), 1 node is taking
the brunt -
40-50 load (16 core).

We can't figure out why - any pointers? Anything I can do to debug what is
going on?

Can you gist a stack trace on the busy node? You can use jstack to do so:
jstack - Stack Trace.

On Sat, Mar 17, 2012 at 4:10 AM, John Cwikla cwikla@fwix.com wrote:

We have a 4 node cluster with 1 index, 25m docs or so. We have many
processes reading/writing to our cluster through
pyes which we've verified randomly chooses an initial node to talk to.

Strange thing is, 3 of the 4 nodes are bored (0.25 load), 1 node is taking
the brunt -
40-50 load (16 core).

We can't figure out why - any pointers? Anything I can do to debug what is
going on?

On Fri, 2012-03-16 at 19:10 -0700, John Cwikla wrote:

We have a 4 node cluster with 1 index, 25m docs or so. We have many
processes reading/writing to our cluster through
pyes which we've verified randomly chooses an initial node to talk to.

Strange thing is, 3 of the 4 nodes are bored (0.25 load), 1 node is
taking the brunt -
40-50 load (16 core).

Is it continuous? or just for a period? If the latter, it was probably
doing a merge of big segments. So the load is actually IO, not CPU.

I've been seeing the same thing on one of my indexes. It starts to do a
single merge of GBs of data, and the machine is too overloaded to
respond to requests.

You can check if there is a merge happening by looking at the index
status:

clint

John,

Can you check if your shards are well balanced?
You may be able to tell what's going on using ES head and SPM for ES:

http://mobz.github.com/elasticsearch-head/
http://apps.sematext.com/

The former you run on your own servers, the latter is a service. Both are
free.

Otis

Hiring Elasticsearch Engineers World-Wide --

On Saturday, March 17, 2012 10:10:05 AM UTC+8, John Cwikla wrote:

We have a 4 node cluster with 1 index, 25m docs or so. We have many
processes reading/writing to our cluster through
pyes which we've verified randomly chooses an initial node to talk to.

Strange thing is, 3 of the 4 nodes are bored (0.25 load), 1 node is taking
the brunt -
40-50 load (16 core).

We can't figure out why - any pointers? Anything I can do to debug what is
going on?

Thanks for all the pointers - I'll start diving into all these and let you
know what
I come up with.

On Monday, March 19, 2012 2:35:11 AM UTC-7, Otis Gospodnetic wrote:

John,

Can you check if your shards are well balanced?
You may be able to tell what's going on using ES head and SPM for ES:

http://mobz.github.com/elasticsearch-head/
http://apps.sematext.com/

The former you run on your own servers, the latter is a service. Both are
free.

Otis

Hiring Elasticsearch Engineers World-Wide --
Jobs - Sematext

On Saturday, March 17, 2012 10:10:05 AM UTC+8, John Cwikla wrote:

We have a 4 node cluster with 1 index, 25m docs or so. We have many
processes reading/writing to our cluster through
pyes which we've verified randomly chooses an initial node to talk to.

Strange thing is, 3 of the 4 nodes are bored (0.25 load), 1 node is
taking the brunt -
40-50 load (16 core).

We can't figure out why - any pointers? Anything I can do to debug what
is going on?

Elasticsearch head gave me some interesting results. So we have 3 indexes
on our 4 node cluster.

The first index is just a test, so ignore it.

The second is 25m docs (19G), and has 3 shards on 1 node and 1 on each other

The third is 16m docs (30G) and has 5 shards on 1, 4 on another, 2 on
another, and 1 on the
last.

The one with 5 shards is the one I was having the problem with. Seems like
I need to have
the nodes rebalance somehow to take advantage of all the hardware I have...

At this point our load is average across the machines, so I haven't used
jstack, but I will
next time it pops up.

On Friday, March 16, 2012 7:10:05 PM UTC-7, John Cwikla wrote:

We have a 4 node cluster with 1 index, 25m docs or so. We have many
processes reading/writing to our cluster through
pyes which we've verified randomly chooses an initial node to talk to.

Strange thing is, 3 of the 4 nodes are bored (0.25 load), 1 node is taking
the brunt -
40-50 load (16 core).

We can't figure out why - any pointers? Anything I can do to debug what is
going on?

Hmm, so I've been looking around and saw that rebalancing replica shards
isn't something that can be done?
It is the case the primaries are balanced, but the replicas are overloaded
on 1 machine in a 4 machine cluster.

Is there something I need to do, or can do to make them rebalance?

On Friday, March 16, 2012 7:10:05 PM UTC-7, John Cwikla wrote:

We have a 4 node cluster with 1 index, 25m docs or so. We have many
processes reading/writing to our cluster through
pyes which we've verified randomly chooses an initial node to talk to.

Strange thing is, 3 of the 4 nodes are bored (0.25 load), 1 node is taking
the brunt -
40-50 load (16 core).

We can't figure out why - any pointers? Anything I can do to debug what is
going on?

On Friday, March 16, 2012 7:10:05 PM UTC-7, John Cwikla wrote:

We have a 4 node cluster with 1 index, 25m docs or so. We have many
processes reading/writing to our cluster through
pyes which we've verified randomly chooses an initial node to talk to.

Strange thing is, 3 of the 4 nodes are bored (0.25 load), 1 node is taking
the brunt -
40-50 load (16 core).

We can't figure out why - any pointers? Anything I can do to debug what is
going on?

On Friday, March 16, 2012 7:10:05 PM UTC-7, John Cwikla wrote:

We have a 4 node cluster with 1 index, 25m docs or so. We have many
processes reading/writing to our cluster through
pyes which we've verified randomly chooses an initial node to talk to.

Strange thing is, 3 of the 4 nodes are bored (0.25 load), 1 node is taking
the brunt -
40-50 load (16 core).

We can't figure out why - any pointers? Anything I can do to debug what is
going on?

And here's my shard list:

{

  • index: {
    • primary_size: 30.5gb
    • primary_size_in_bytes: 32762267538
    • size: 62.5gb
    • size_in_bytes: 67185556471
      }
  • translog: {
    • operations: 28487
      }
  • docs: {
    • num_docs: 16378524
    • max_doc: 24589536
    • deleted_docs: 8211012
      }
  • merges: {
    • current: 0
    • current_docs: 0
    • current_size: 0b
    • current_size_in_bytes: 0
    • total: 943760
    • total_time: 1.5d
    • total_time_in_millis: 129794974
    • total_docs: 2083671491
    • total_size: 2704.7gb
    • total_size_in_bytes: 2904189840468
      }
  • refresh: {
    • total: 8353127
    • total_time: 1.9d
    • total_time_in_millis: 171464206
      }
  • flush: {
    • total: 131156
    • total_time: 5.5h
    • total_time_in_millis: 19850157
      }
  • shards: {
    • 0: [
      • {
        • routing: {
          • state: STARTED
          • primary: false
          • node: nELxlGs6Tzii12JRCNxpGw
          • relocating_node: null
          • shard: 0
          • index: places
            }
        • state: STARTED
        • index: {
          • size: 4.9gb
          • size_in_bytes: 5352243736
            }
        • translog: {
          • id: 1326141078235
          • operations: 3211
            }
        • docs: {
          • num_docs: 2647990
          • max_doc: 4003535
          • deleted_docs: 1355545
            }
        • merges: {
          • current: 0
          • current_docs: 0
          • current_size: 0b
          • current_size_in_bytes: 0
          • total: 98373
          • total_time: 4.6h
          • total_time_in_millis: 16647902
          • total_docs: 213859847
          • total_size: 276.4gb
          • total_size_in_bytes: 296867190719
            }
        • refresh: {
          • total: 871137
          • total_time: 5.4h
          • total_time_in_millis: 19523847
            }
        • flush: {
          • total: 13594
          • total_time: 49.3m
          • total_time_in_millis: 2963485
            }
            }
      • {
        • routing: {
          • state: STARTED
          • primary: true
          • node: ab4uo_iyTw2WIT2-eb_F6Q
          • relocating_node: null
          • shard: 0
          • index: places
            }
        • state: STARTED
        • index: {
          • size: 4.9gb
          • size_in_bytes: 5319904685
            }
        • translog: {
          • id: 1326141078245
          • operations: 2483
            }
        • docs: {
          • num_docs: 2647990
          • max_doc: 3981357
          • deleted_docs: 1333367
            }
        • merges: {
          • current: 0
          • current_docs: 0
          • current_size: 0b
          • current_size_in_bytes: 0
          • total: 37155
          • total_time: 57.1m
          • total_time_in_millis: 3426101
          • total_docs: 78881622
          • total_size: 101.8gb
          • total_size_in_bytes: 109329911898
            }
        • refresh: {
          • total: 328715
          • total_time: 1.5h
          • total_time_in_millis: 5755051
            }
        • flush: {
          • total: 4796
          • total_time: 5.7m
          • total_time_in_millis: 344400
            }
            }
            ]
    • 1: [
      • {
        • routing: {
          • state: STARTED
          • primary: false
          • node: ab4uo_iyTw2WIT2-eb_F6Q
          • relocating_node: null
          • shard: 1
          • index: places
            }
        • state: STARTED
        • index: {
          • size: 5.8gb
          • size_in_bytes: 6262820543
            }
        • translog: {
          • id: 1326141078607
          • operations: 56
            }
        • docs: {
          • num_docs: 2803977
          • max_doc: 4754923
          • deleted_docs: 1950946
            }
        • merges: {
          • current: 0
          • current_docs: 0
          • current_size: 0b
          • current_size_in_bytes: 0
          • total: 37336
          • total_time: 57.3m
          • total_time_in_millis: 3443330
          • total_docs: 79711670
          • total_size: 103.8gb
          • total_size_in_bytes: 111496656320
            }
        • refresh: {
          • total: 330123
          • total_time: 1.6h
          • total_time_in_millis: 5859695
            }
        • flush: {
          • total: 4921
          • total_time: 6m
          • total_time_in_millis: 360131
            }
            }
      • {
        • routing: {
          • state: STARTED
          • primary: true
          • node: nELxlGs6Tzii12JRCNxpGw
          • relocating_node: null
          • shard: 1
          • index: places
            }
        • state: STARTED
        • index: {
          • size: 5.8gb
          • size_in_bytes: 6296790810
            }
        • translog: {
          • id: 1326141078607
          • operations: 51
            }
        • docs: {
          • num_docs: 2803978
          • max_doc: 4786774
          • deleted_docs: 1982796
            }
        • merges: {
          • current: 0
          • current_docs: 0
          • current_size: 0b
          • current_size_in_bytes: 0
          • total: 99070
          • total_time: 4.8h
          • total_time_in_millis: 17343659
          • total_docs: 218051849
          • total_size: 283gb
          • total_size_in_bytes: 303888376066
            }
        • refresh: {
          • total: 876293
          • total_time: 5.8h
          • total_time_in_millis: 20913727
            }
        • flush: {
          • total: 13835
          • total_time: 53.2m
          • total_time_in_millis: 3192033
            }
            }
            ]
    • 2: [
      • {
        • routing: {
          • state: STARTED
          • primary: true
          • node: ab4uo_iyTw2WIT2-eb_F6Q
          • relocating_node: null
          • shard: 2
          • index: places
            }
        • state: STARTED
        • index: {
          • size: 5.2gb
          • size_in_bytes: 5589513799
            }
        • translog: {
          • id: 1326141079557
          • operations: 2653
            }
        • docs: {
          • num_docs: 2737540
          • max_doc: 4189158
          • deleted_docs: 1451618
            }
        • merges: {
          • current: 0
          • current_docs: 0
          • current_size: 0b
          • current_size_in_bytes: 0
          • total: 37295
          • total_time: 1h
          • total_time_in_millis: 3602012
          • total_docs: 86544448
          • total_size: 111.6gb
          • total_size_in_bytes: 119899765102
            }
        • refresh: {
          • total: 328507
          • total_time: 1.6h
          • total_time_in_millis: 6103834
            }
        • flush: {
          • total: 5502
          • total_time: 6.5m
          • total_time_in_millis: 395750
            }
            }
      • {
        • routing: {
          • state: STARTED
          • primary: false
          • node: nELxlGs6Tzii12JRCNxpGw
          • relocating_node: null
          • shard: 2
          • index: places
            }
        • state: STARTED
        • index: {
          • size: 5.4gb
          • size_in_bytes: 5820645020
            }
        • translog: {
          • id: 1326141079550
          • operations: 4779
            }
        • docs: {
          • num_docs: 2737540
          • max_doc: 4368769
          • deleted_docs: 1631229
            }
        • merges: {
          • current: 0
          • current_docs: 0
          • current_size: 0b
          • current_size_in_bytes: 0
          • total: 98407
          • total_time: 4.9h
          • total_time_in_millis: 17647596
          • total_docs: 222704925
          • total_size: 289.4gb
          • total_size_in_bytes: 310757299702
            }
        • refresh: {
          • total: 870872
          • total_time: 5.6h
          • total_time_in_millis: 20188623
            }
        • flush: {
          • total: 14195
          • total_time: 52.5m
          • total_time_in_millis: 3154127
            }
            }
            ]
    • 3: [
      • {
        • routing: {
          • state: STARTED
          • primary: true
          • node: T5_oKWa3RTO7osracRtPoQ
          • relocating_node: null
          • shard: 3
          • index: places
            }
        • state: STARTED
        • index: {
          • size: 4.9gb
          • size_in_bytes: 5343294260
            }
        • translog: {
          • id: 1326141078245
          • operations: 2124
            }
        • docs: {
          • num_docs: 2648125
          • max_doc: 3993220
          • deleted_docs: 1345095
            }
        • merges: {
          • current: 0
          • current_docs: 0
          • current_size: 0b
          • current_size_in_bytes: 0
          • total: 100192
          • total_time: 2.7h
          • total_time_in_millis: 9740764
          • total_docs: 216533766
          • total_size: 280.8gb
          • total_size_in_bytes: 301514882857
            }
        • refresh: {
          • total: 888382
          • total_time: 4.3h
          • total_time_in_millis: 15669718
            }
        • flush: {
          • total: 13576
          • total_time: 15.2m
          • total_time_in_millis: 917058
            }
            }
      • {
        • routing: {
          • state: STARTED
          • primary: false
          • node: ab4uo_iyTw2WIT2-eb_F6Q
          • relocating_node: null
          • shard: 3
          • index: places
            }
        • state: STARTED
        • index: {
          • size: 4.9gb
          • size_in_bytes: 5349875537
            }
        • translog: {
          • id: 1326141078241
          • operations: 3886
            }
        • docs: {
          • num_docs: 2648125
          • max_doc: 3994418
          • deleted_docs: 1346293
            }
        • merges: {
          • current: 0
          • current_docs: 0
          • current_size: 0b
          • current_size_in_bytes: 0
          • total: 37161
          • total_time: 56.5m
          • total_time_in_millis: 3393844
          • total_docs: 78031855
          • total_size: 101.5gb
          • total_size_in_bytes: 108996560347
            }
        • refresh: {
          • total: 328585
          • total_time: 1.6h
          • total_time_in_millis: 5779601
            }
        • flush: {
          • total: 4780
          • total_time: 5.8m
          • total_time_in_millis: 351341
            }
            }
            ]
    • 4: [
      • {
        • routing: {
          • state: STARTED
          • primary: true
          • node: T5_oKWa3RTO7osracRtPoQ
          • relocating_node: null
          • shard: 4
          • index: places
            }
        • state: STARTED
        • index: {
          • size: 5.3gb
          • size_in_bytes: 5757997414
            }
        • translog: {
          • id: 1326141078676
          • operations: 2550
            }
        • docs: {
          • num_docs: 2804287
          • max_doc: 4323833
          • deleted_docs: 1519546
            }
        • merges: {
          • current: 0
          • current_docs: 0
          • current_size: 0b
          • current_size_in_bytes: 0
          • total: 100919
          • total_time: 2.7h
          • total_time_in_millis: 9958928
          • total_docs: 221717044
          • total_size: 288.5gb
          • total_size_in_bytes: 309792436202
            }
        • refresh: {
          • total: 893623
          • total_time: 4.3h
          • total_time_in_millis: 15799989
            }
        • flush: {
          • total: 13890
          • total_time: 15.4m
          • total_time_in_millis: 927816
            }
            }
      • {
        • routing: {
          • state: STARTED
          • primary: false
          • node: nELxlGs6Tzii12JRCNxpGw
          • relocating_node: null
          • shard: 4
          • index: places
            }
        • state: STARTED
        • index: {
          • size: 5.3gb
          • size_in_bytes: 5746398501
            }
        • translog: {
          • id: 1326141078670
          • operations: 852
            }
        • docs: {
          • num_docs: 2804287
          • max_doc: 4316365
          • deleted_docs: 1512078
            }
        • merges: {
          • current: 0
          • current_docs: 0
          • current_size: 0b
          • current_size_in_bytes: 0
          • total: 99169
          • total_time: 4.7h
          • total_time_in_millis: 17225939
          • total_docs: 222259294
          • total_size: 288.5gb
          • total_size_in_bytes: 309856763814
            }
        • refresh: {
          • total: 877186
          • total_time: 5.5h
          • total_time_in_millis: 19947688
            }
        • flush: {
          • total: 13880
          • total_time: 51.7m
          • total_time_in_millis: 3106099
            }
            }
            ]
    • 5: [
      • {
        • routing: {
          • state: STARTED
          • primary: false
          • node: nELxlGs6Tzii12JRCNxpGw
          • relocating_node: null
          • shard: 5
          • index: places
            }
        • state: STARTED
        • index: {
          • size: 5.4gb
          • size_in_bytes: 5891305596
            }
        • translog: {
          • id: 1326141079451
          • operations: 388
            }
        • docs: {
          • num_docs: 2736604
          • max_doc: 4468212
          • deleted_docs: 1731608
            }
        • merges: {
          • current: 0
          • current_docs: 0
          • current_size: 0b
          • current_size_in_bytes: 0
          • total: 98430
          • total_time: 4.8h
          • total_time_in_millis: 17328442
          • total_docs: 218687019
          • total_size: 284.5gb
          • total_size_in_bytes: 305584707796
            }
        • refresh: {
          • total: 871118
          • total_time: 5.5h
          • total_time_in_millis: 20154030
            }
        • flush: {
          • total: 14086
          • total_time: 54m
          • total_time_in_millis: 3242524
            }
            }
      • {
        • routing: {
          • state: STARTED
          • primary: true
          • node: XGfCby6mT0Ox2zESwvBPsw
          • relocating_node: null
          • shard: 5
          • index: places
            }
        • state: STARTED
        • index: {
          • size: 4.1gb
          • size_in_bytes: 4454766570
            }
        • translog: {
          • id: 1326141079457
          • operations: 5454
            }
        • docs: {
          • num_docs: 2736604
          • max_doc: 3315194
          • deleted_docs: 578590
            }
        • merges: {
          • current: 0
          • current_docs: 0
          • current_size: 0b
          • current_size_in_bytes: 0
          • total: 100253
          • total_time: 2.7h
          • total_time_in_millis: 10036457
          • total_docs: 226688152
          • total_size: 294.4gb
          • total_size_in_bytes: 316205289645
            }
        • refresh: {
          • total: 888586
          • total_time: 4.3h
          • total_time_in_millis: 15768403
            }
        • flush: {
          • total: 14101
          • total_time: 14.9m
          • total_time_in_millis: 895393
            }
            }
            ]
            }

}

On Friday, March 16, 2012 7:10:05 PM UTC-7, John Cwikla wrote:

We have a 4 node cluster with 1 index, 25m docs or so. We have many
processes reading/writing to our cluster through
pyes which we've verified randomly chooses an initial node to talk to.

Strange thing is, 3 of the 4 nodes are bored (0.25 load), 1 node is taking
the brunt -
40-50 load (16 core).

We can't figure out why - any pointers? Anything I can do to debug what is
going on?

Ha, I figured out what was going on. I was totally ignoring a test index
that had been balanced onto two of the nodes and
so the 3 indexes' shards were balanced across ALL the nodes, my assumption
is that each index would be balanced by shard across
all the machines not that ALL shards are balanced across the nodes,
regardless of index.

I'd love that as an option (to balance shards across the nodes by index),
as it's moving my two indexes to be split pretty much 1 index on 2
machines, 1 index on the other
2 machines. Since one of my indexes is a legacy index (we use it about
1/100th the other one), I'm still not utilizing my machines.

Now I'll just need to figure out how to trick it :slight_smile:

On Friday, March 16, 2012 7:10:05 PM UTC-7, John Cwikla wrote:

We have a 4 node cluster with 1 index, 25m docs or so. We have many
processes reading/writing to our cluster through
pyes which we've verified randomly chooses an initial node to talk to.

Strange thing is, 3 of the 4 nodes are bored (0.25 load), 1 node is taking
the brunt -
40-50 load (16 core).

We can't figure out why - any pointers? Anything I can do to debug what is
going on?

And just to finish up - I ended up setting my legacy index to have it's
replica's to node-1 instances -
it's only read-only, so it's just disk cost - and this is forcing my new
index to balance across
my machines evenly like I want.

Sweet!

On Friday, March 16, 2012 7:10:05 PM UTC-7, John Cwikla wrote:

We have a 4 node cluster with 1 index, 25m docs or so. We have many
processes reading/writing to our cluster through
pyes which we've verified randomly chooses an initial node to talk to.

Strange thing is, 3 of the 4 nodes are bored (0.25 load), 1 node is taking
the brunt -
40-50 load (16 core).

We can't figure out why - any pointers? Anything I can do to debug what is
going on?

I wonder if you are coming across this issue:
http://elasticsearch-users.115913.n3.nabble.com/Shard-Balancing-td3676576.html

On Wed, Mar 21, 2012 at 7:53 PM, John Cwikla cwikla@fwix.com wrote:

And just to finish up - I ended up setting my legacy index to have it's
replica's to node-1 instances -
it's only read-only, so it's just disk cost - and this is forcing my new
index to balance across
my machines evenly like I want.

Sweet!

On Friday, March 16, 2012 7:10:05 PM UTC-7, John Cwikla wrote:

We have a 4 node cluster with 1 index, 25m docs or so. We have many
processes reading/writing to our cluster through
pyes which we've verified randomly chooses an initial node to talk to.

Strange thing is, 3 of the 4 nodes are bored (0.25 load), 1 node is taking
the brunt -
40-50 load (16 core).

We can't figure out why - any pointers? Anything I can do to debug what is
going on?

Yep, that looks like it. I like the pointer at the ability to create your
own allocator, definitely will look into that.

On Friday, March 16, 2012 7:10:05 PM UTC-7, John Cwikla wrote:

We have a 4 node cluster with 1 index, 25m docs or so. We have many
processes reading/writing to our cluster through
pyes which we've verified randomly chooses an initial node to talk to.

Strange thing is, 3 of the 4 nodes are bored (0.25 load), 1 node is taking
the brunt -
40-50 load (16 core).

We can't figure out why - any pointers? Anything I can do to debug what is
going on?