Shard timeout problem on AWS

Craig_Brown · April 27, 2012, 7:00pm

While doing some large scale indexing, I run across this problem
fairly often, especially when having indexed > 200m records. This is a
2-node cluster with 1 index and 50 shards 0 replicas. Index size is
~600GB per node with 390m records currently. Each node has 4-250GB
EBS volumes with the ES data spread across all 4 volumes. Each volume
is 62% full. Each bulk indexing request has 10k records of ~3k each.

[1]: index [documents], type [records], id [43-116530000], message
[UnavailableShardsException[[documents][21] [1] shardIt, [0] active :
Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@2c761b38]]

It looks like after this error occurs, that the master will disconnect
and restart, but elects itself as master instead of reconnecting with
the other node. I have to stop both nodes, and then restart to get the
cluster back. I have noted similar behavior with a 4-node cluster.

My basic question is: is the timeout being caused by delays in AWS,
EBS, or is it more likely a result of the large index size and how
Lucene handles disk I/O?

Secondarily, can this situation be avoided? If disk latency is too
high, can we back off/recover more gracefully?

Thanks!

Craig

MASTER NODE
[2012-04-27 15:06:59,308][INFO ][node ] [Courier]
{0.19.2}[7517]: started
[2012-04-27 15:07:12,156][INFO ][gateway ] [Courier]
recovered [1] indices into cluster_state
[2012-04-27 15:07:28,506][INFO ][cluster.service ] [Courier]
added {[Styx and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/
10.33.15.1:9300]],}, reason: zen-disco-receive(join from node[[Styx
and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/10.33.15.1:9300]]])
[2012-04-27 15:11:21,474][INFO ][cluster.service ] [Courier]
added {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(join from node[[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ]
[inet[/10.38.249.226:9301]]{client=true, data=false}])
[2012-04-27 16:12:51,368][INFO ][cluster.service ] [Courier]
removed {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
node_left([Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false})
[2012-04-27 16:15:16,046][INFO ][cluster.service ] [Courier]
added {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(join from node[[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false}])
[2012-04-27 17:09:28,431][INFO ][cluster.service ] [Courier]
removed {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
node_left([Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false})
[2012-04-27 17:11:53,985][INFO ][cluster.service ] [Courier]
added {[Glamor][TZwHuJlzRE-cDB-MLKA-tw][inet[/10.38.249.226:9301]]
{client=true, data=false},}, reason: zen-disco-receive(join from
node[[Glamor][TZwHuJlzRE-cDB-MLKA-tw][inet[/10.38.249.226:9301]]
{client=true, data=false}])
[2012-04-27 17:59:17,030][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: initializing ...
[2012-04-27 17:59:17,048][INFO ][plugins ] [Multiple
Man] loaded [], sites [bigdesk]
[2012-04-27 17:59:20,068][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: initialized
[2012-04-27 17:59:20,069][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: starting ...
[2012-04-27 17:59:20,173][INFO ][transport ] [Multiple
Man] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.38.249.226:9300]}
[2012-04-27 17:59:24,694][INFO ][cluster.service ] [Multiple
Man] new_master [Multiple Man][zQvB_5w3Tee3rW-EJ6Iahw][inet[/
10.38.249.226:9300]], reason: zen-disco-join (elected_as_master)
[2012-04-27 17:59:24,727][INFO ][discovery ] [Multiple
Man] youwhodoc/zQvB_5w3Tee3rW-EJ6Iahw
[2012-04-27 17:59:24,742][INFO ][http ] [Multiple
Man] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.38.249.226:9200]}
[2012-04-27 17:59:24,742][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: started
[2012-04-27 17:59:44,560][INFO ][gateway ] [Multiple
Man] recovered [1] indices into cluster_state

[2012-04-27 15:07:05,713][INFO ][transport ] [Styx and
Stone] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.33.15.1:9300]}
[2012-04-27 15:07:28,517][INFO ][cluster.service ] [Styx and
Stone] detected_master [Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]], added {[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]],}, reason: zen-disco-receive(from master
[[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/10.38.249.226:9300]]])
[2012-04-27 15:07:28,527][INFO ][discovery ] [Styx and
Stone] youwhodoc/1p004qNdQ8iq2cL3otF1Jg
[2012-04-27 15:07:28,532][INFO ][http ] [Styx and
Stone] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.33.15.1:9200]}
[2012-04-27 15:07:28,532][INFO ][node ] [Styx and
Stone] {0.19.2}[8991]: started
[2012-04-27 15:11:21,476][INFO ][cluster.service ] [Styx and
Stone] added {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 16:12:51,386][INFO ][cluster.service ] [Styx and
Stone] removed {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 16:15:16,050][INFO ][cluster.service ] [Styx and
Stone] added {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:09:28,717][INFO ][cluster.service ] [Styx and
Stone] removed {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:11:53,989][INFO ][cluster.service ] [Styx and
Stone] added {[Glamor][TZwHuJlzRE-cDB-MLKA-tw][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:59:10,337][INFO ][discovery.zen ] [Styx and
Stone] master_left [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]], reason [transport disconnected (with verified
connect)]
[2012-04-27 17:59:10,386][INFO ][cluster.service ] [Styx and
Stone] master {new [Styx and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/
10.33.15.1:9300]], previous [Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]}, removed {[Courier][UmhafZyUQB-r3weDCPftIQ]
[inet[/10.38.249.226:9300]],}, reason: zen-disco-master_failed
([Courier][UmhafZyUQB-r3weDCPftIQ][inet[/10.38.249.226:9300]])

Igor_Motov · April 28, 2012, 3:24am

There are two issues here. First - node restart and second - master
discovery.

In order to figure out the node restart problem, it would be useful to know
which version of java you are usings. After the crash do you see
any hs_err_pid*.log files lying around? Do you see any messages containing
phrase "killed process" in /var/log/messages, /var/log/syslog or
/var/log/kern.log?

The second issue might be trickier. Do you always start the first node as a
master? Can you reproduce the issue by just restarting the master without
waiting for it to get restarted? If you can, it would be nice to get more
granular logs for discovery and cluster service by adding these lines to
logging.yml.

logger:
discovery=TRACE
cluster=TRACE

You config files for both nodes might be also helpful.

On Friday, April 27, 2012 3:00:39 PM UTC-4, Craig Brown wrote:

While doing some large scale indexing, I run across this problem
fairly often, especially when having indexed > 200m records. This is a
2-node cluster with 1 index and 50 shards 0 replicas. Index size is
~600GB per node with 390m records currently. Each node has 4-250GB
EBS volumes with the ES data spread across all 4 volumes. Each volume
is 62% full. Each bulk indexing request has 10k records of ~3k each.

[1]: index [documents], type [records], id [43-116530000], message
[UnavailableShardsException[[documents][21] [1] shardIt, [0] active :
Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@2c761b38]]

It looks like after this error occurs, that the master will disconnect
and restart, but elects itself as master instead of reconnecting with
the other node. I have to stop both nodes, and then restart to get the
cluster back. I have noted similar behavior with a 4-node cluster.

My basic question is: is the timeout being caused by delays in AWS,
EBS, or is it more likely a result of the large index size and how
Lucene handles disk I/O?

Secondarily, can this situation be avoided? If disk latency is too
high, can we back off/recover more gracefully?

Thanks!

Craig

MASTER NODE
[2012-04-27 15:06:59,308][INFO ][node ] [Courier]
{0.19.2}[7517]: started
[2012-04-27 15:07:12,156][INFO ][gateway ] [Courier]
recovered [1] indices into cluster_state
[2012-04-27 15:07:28,506][INFO ][cluster.service ] [Courier]
added {[Styx and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/
10.33.15.1:9300]],}, reason: zen-disco-receive(join from node[[Styx
and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/10.33.15.1:9300]]])
[2012-04-27 15:11:21,474][INFO ][cluster.service ] [Courier]
added {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(join from node[[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ]
[inet[/10.38.249.226:9301]]{client=true, data=false}])
[2012-04-27 16:12:51,368][INFO ][cluster.service ] [Courier]
removed {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
node_left([Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false})
[2012-04-27 16:15:16,046][INFO ][cluster.service ] [Courier]
added {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(join from node[[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false}])
[2012-04-27 17:09:28,431][INFO ][cluster.service ] [Courier]
removed {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
node_left([Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false})
[2012-04-27 17:11:53,985][INFO ][cluster.service ] [Courier]
added {[Glamor][TZwHuJlzRE-cDB-MLKA-tw][inet[/10.38.249.226:9301]]
{client=true, data=false},}, reason: zen-disco-receive(join from
node[[Glamor][TZwHuJlzRE-cDB-MLKA-tw][inet[/10.38.249.226:9301]]
{client=true, data=false}])
[2012-04-27 17:59:17,030][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: initializing ...
[2012-04-27 17:59:17,048][INFO ][plugins ] [Multiple
Man] loaded , sites [bigdesk]
[2012-04-27 17:59:20,068][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: initialized
[2012-04-27 17:59:20,069][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: starting ...
[2012-04-27 17:59:20,173][INFO ][transport ] [Multiple
Man] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.38.249.226:9300]}
[2012-04-27 17:59:24,694][INFO ][cluster.service ] [Multiple
Man] new_master [Multiple Man][zQvB_5w3Tee3rW-EJ6Iahw][inet[/
10.38.249.226:9300]], reason: zen-disco-join (elected_as_master)
[2012-04-27 17:59:24,727][INFO ][discovery ] [Multiple
Man] youwhodoc/zQvB_5w3Tee3rW-EJ6Iahw
[2012-04-27 17:59:24,742][INFO ][http ] [Multiple
Man] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.38.249.226:9200]}
[2012-04-27 17:59:24,742][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: started
[2012-04-27 17:59:44,560][INFO ][gateway ] [Multiple
Man] recovered [1] indices into cluster_state

[2012-04-27 15:07:05,713][INFO ][transport ] [Styx and
Stone] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.33.15.1:9300]}
[2012-04-27 15:07:28,517][INFO ][cluster.service ] [Styx and
Stone] detected_master [Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]], added {[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]],}, reason: zen-disco-receive(from master
[[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/10.38.249.226:9300]]])
[2012-04-27 15:07:28,527][INFO ][discovery ] [Styx and
Stone] youwhodoc/1p004qNdQ8iq2cL3otF1Jg
[2012-04-27 15:07:28,532][INFO ][http ] [Styx and
Stone] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.33.15.1:9200]}
[2012-04-27 15:07:28,532][INFO ][node ] [Styx and
Stone] {0.19.2}[8991]: started
[2012-04-27 15:11:21,476][INFO ][cluster.service ] [Styx and
Stone] added {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 16:12:51,386][INFO ][cluster.service ] [Styx and
Stone] removed {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 16:15:16,050][INFO ][cluster.service ] [Styx and
Stone] added {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:09:28,717][INFO ][cluster.service ] [Styx and
Stone] removed {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:11:53,989][INFO ][cluster.service ] [Styx and
Stone] added {[Glamor][TZwHuJlzRE-cDB-MLKA-tw][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:59:10,337][INFO ][discovery.zen ] [Styx and
Stone] master_left [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]], reason [transport disconnected (with verified
connect)]
[2012-04-27 17:59:10,386][INFO ][cluster.service ] [Styx and
Stone] master {new [Styx and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/
10.33.15.1:9300]], previous [Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]}, removed {[Courier][UmhafZyUQB-r3weDCPftIQ]
[inet[/10.38.249.226:9300]],}, reason: zen-disco-master_failed
([Courier][UmhafZyUQB-r3weDCPftIQ][inet[/10.38.249.226:9300]])

kimchy · April 29, 2012, 5:11pm

To Add:

It seems like your master node got restarted at 2012-04-27 17:59:17,03. Are
you using the service wrapper, if so, check its service.log to see if it
restarted it for some reason.

50 shards index on 2 nodes seems like a bit too much (unless you are using
routing). What are the spec of the machines? How much memory have you
allocated to ES JVM

On Sat, Apr 28, 2012 at 6:24 AM, Igor Motov imotov@gmail.com wrote:

There are two issues here. First - node restart and second - master
discovery.

In order to figure out the node restart problem, it would be useful to
know which version of java you are usings. After the crash do you see
any hs_err_pid*.log files lying around? Do you see any messages containing
phrase "killed process" in /var/log/messages, /var/log/syslog or
/var/log/kern.log?

The second issue might be trickier. Do you always start the first node as
a master? Can you reproduce the issue by just restarting the master without
waiting for it to get restarted? If you can, it would be nice to get more
granular logs for discovery and cluster service by adding these lines to
logging.yml.

logger:
discovery=TRACE
cluster=TRACE

You config files for both nodes might be also helpful.

On Friday, April 27, 2012 3:00:39 PM UTC-4, Craig Brown wrote:

While doing some large scale indexing, I run across this problem
fairly often, especially when having indexed > 200m records. This is a
2-node cluster with 1 index and 50 shards 0 replicas. Index size is
~600GB per node with 390m records currently. Each node has 4-250GB
EBS volumes with the ES data spread across all 4 volumes. Each volume
is 62% full. Each bulk indexing request has 10k records of ~3k each.

[1]: index [documents], type [records], id [43-116530000], message
[UnavailableShardsException[[**documents][21] [1] shardIt, [0] active :
Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.**BulkShardRequest@2c761b38]]

It looks like after this error occurs, that the master will disconnect
and restart, but elects itself as master instead of reconnecting with
the other node. I have to stop both nodes, and then restart to get the
cluster back. I have noted similar behavior with a 4-node cluster.

My basic question is: is the timeout being caused by delays in AWS,
EBS, or is it more likely a result of the large index size and how
Lucene handles disk I/O?

Secondarily, can this situation be avoided? If disk latency is too
high, can we back off/recover more gracefully?

Thanks!

Craig

MASTER NODE
[2012-04-27 15:06:59,308][INFO ][node ] [Courier]
{0.19.2}[7517]: started
[2012-04-27 15:07:12,156][INFO ][gateway ] [Courier]
recovered [1] indices into cluster_state
[2012-04-27 15:07:28,506][INFO ][cluster.service ] [Courier]
added {[Styx and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/
10.33.15.1:9300]],}, reason: zen-disco-receive(join from node[[Styx
and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/10.33.15.1:9300]]])
[2012-04-27 15:11:21,474][INFO ][cluster.service ] [Courier]
added {[Summers, Gabriel][QzF5z1yGQ2K-**kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=**true, data=false},}, reason: zen-disco-
receive(join from node[[Summers, Gabriel][QzF5z1yGQ2K-**kLESKRspwQ]
[inet[/10.38.249.226:9301]]{**client=true, data=false}])
[2012-04-27 16:12:51,368][INFO ][cluster.service ] [Courier]
removed {[Summers, Gabriel][QzF5z1yGQ2K-**kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=**true, data=false},}, reason: zen-disco-
node_left([Summers, Gabriel][QzF5z1yGQ2K-**kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=**true, data=false})
[2012-04-27 16:15:16,046][INFO ][cluster.service ] [Courier]
added {[Infant Terrible][IyaA9PpDSWKJ8H-**7KjlScg][inet[/
10.38.249.226:9301]]{client=**true, data=false},}, reason: zen-disco-
receive(join from node[[Infant Terrible][IyaA9PpDSWKJ8H-**7KjlScg][inet[/

10.38.249.226:9301]]{client=**true, data=false}])
[2012-04-27 17:09:28,431][INFO ][cluster.service ] [Courier]
removed {[Infant Terrible][IyaA9PpDSWKJ8H-**7KjlScg][inet[/
10.38.249.226:9301]]{client=**true, data=false},}, reason: zen-disco-
node_left([Infant Terrible][IyaA9PpDSWKJ8H-**7KjlScg][inet[/
10.38.249.226:9301]]{client=**true, data=false})
[2012-04-27 17:11:53,985][INFO ][cluster.service ] [Courier]
added {[Glamor][TZwHuJlzRE-cDB-MLKA-**tw][inet[/10.38.249.226:9301]]
{client=true, data=false},}, reason: zen-disco-receive(join from
node[[Glamor][TZwHuJlzRE-cDB-**MLKA-tw][inet[/10.38.249.226:**9301http://10.38.249.226:9301]]

{client=true, data=false}])
[2012-04-27 17:59:17,030][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: initializing ...
[2012-04-27 17:59:17,048][INFO ][plugins ] [Multiple
Man] loaded , sites [bigdesk]
[2012-04-27 17:59:20,068][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: initialized
[2012-04-27 17:59:20,069][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: starting ...
[2012-04-27 17:59:20,173][INFO ][transport ] [Multiple
Man] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.38.249.226:9300]}
[2012-04-27 17:59:24,694][INFO ][cluster.service ] [Multiple
Man] new_master [Multiple Man][zQvB_5w3Tee3rW-EJ6Iahw][**inet[/
10.38.249.226:9300]], reason: zen-disco-join (elected_as_master)
[2012-04-27 17:59:24,727][INFO ][discovery ] [Multiple
Man] youwhodoc/zQvB_5w3Tee3rW-**EJ6Iahw
[2012-04-27 17:59:24,742][INFO ][http ] [Multiple
Man] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.38.249.226:9200]}
[2012-04-27 17:59:24,742][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: started
[2012-04-27 17:59:44,560][INFO ][gateway ] [Multiple
Man] recovered [1] indices into cluster_state

[2012-04-27 15:07:05,713][INFO ][transport ] [Styx and
Stone] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.33.15.1:9300]}
[2012-04-27 15:07:28,517][INFO ][cluster.service ] [Styx and
Stone] detected_master [Courier][UmhafZyUQB-**r3weDCPftIQ][inet[/
10.38.249.226:9300]], added {[Courier][UmhafZyUQB-**r3weDCPftIQ][inet[/
10.38.249.226:9300]],}, reason: zen-disco-receive(from master
[[Courier][UmhafZyUQB-**r3weDCPftIQ][inet[/10.38.249.**226:9300]]])
[2012-04-27 15:07:28,527][INFO ][discovery ] [Styx and
Stone] youwhodoc/**1p004qNdQ8iq2cL3otF1Jg
[2012-04-27 15:07:28,532][INFO ][http ] [Styx and
Stone] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.33.15.1:9200]}
[2012-04-27 15:07:28,532][INFO ][node ] [Styx and
Stone] {0.19.2}[8991]: started
[2012-04-27 15:11:21,476][INFO ][cluster.service ] [Styx and
Stone] added {[Summers, Gabriel][QzF5z1yGQ2K-**kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=**true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-**r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 16:12:51,386][INFO ][cluster.service ] [Styx and
Stone] removed {[Summers, Gabriel][QzF5z1yGQ2K-**kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=**true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-**r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 16:15:16,050][INFO ][cluster.service ] [Styx and
Stone] added {[Infant Terrible][IyaA9PpDSWKJ8H-**7KjlScg][inet[/
10.38.249.226:9301]]{client=**true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-**r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:09:28,717][INFO ][cluster.service ] [Styx and
Stone] removed {[Infant Terrible][IyaA9PpDSWKJ8H-**7KjlScg][inet[/
10.38.249.226:9301]]{client=**true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-**r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:11:53,989][INFO ][cluster.service ] [Styx and
Stone] added {[Glamor][TZwHuJlzRE-cDB-MLKA-**tw][inet[/
10.38.249.226:9301]]{client=**true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-**r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:59:10,337][INFO ][discovery.zen ] [Styx and
Stone] master_left [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]], reason [transport disconnected (with verified
connect)]
[2012-04-27 17:59:10,386][INFO ][cluster.service ] [Styx and
Stone] master {new [Styx and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/
10.33.15.1:9300]], previous [Courier][UmhafZyUQB-**r3weDCPftIQ][inet[/
10.38.249.226:9300]]}, removed {[Courier][UmhafZyUQB-**r3weDCPftIQ]
[inet[/10.38.249.226:9300]],}, reason: zen-disco-master_failed
([Courier][UmhafZyUQB-**r3weDCPftIQ][inet[/10.38.249.**226:9300]])

Craig_Brown · April 30, 2012, 9:26pm

Thanks for helping.
So I see some problems in var/log/messages.
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.187004]
lowmem_reserve: 0 0 3031 3031
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.200023] Node 0 Normal
free:4536kB min:4616kB low:5768kB high:6924kB active_anon:2951188kB
inactive_anon:4kB active_file:72kB inactive_file:1228kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:3104640kB mlocked:0kB dirty:1192kB writeback:0kB mapped:0kB
shmem:68kB slab_reclaimable:13832kB slab_unreclaimable:15144kB
kernel_stack:1560kB pagetables:9232kB unstable:0kB bounce:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no

Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.200164] Free swap = 0kB
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.200169] Total swap = 0kB
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 1837040 pages RAM
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 57003 pages reserved
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 1605 pages shared
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 1768621 pages
non-shared

Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] Out of
memory: Kill process 7517 (java) score 688 or sacrifice child
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] Killed
process 7517 (java) total-vm:9783644kB, anon-rss:5104668kB,
file-rss:0kB

So it clearly looks like it starts swapping and the runs out
completely. I don't see any hs_err_pid files.

I added the log parameters. I'll have see if I can reproduce the
problem without out re-indexing all of the data again. I have to get
up over 200m records before I start seeing the problem. I'll let you
know.

The nodes are on AWS C1XL.
High-CPU Extra Large Instance 7 GB of memory, 20 EC2 Compute Units (8
virtual cores with 2.5 EC2 Compute Units each), 1690 GB of local
instance storage, 64-bit platform

I have 4.5GN for ES. From the looks of this, it would likely be better
to have high-mem instances instead of high-compute.

Craig

On Fri, Apr 27, 2012 at 9:24 PM, Igor Motov imotov@gmail.com wrote:

There are two issues here. First - node restart and second - master
discovery.

In order to figure out the node restart problem, it would be useful to know
which version of java you are usings. After the crash do you see
any hs_err_pid*.log files lying around? Do you see any messages containing
phrase "killed process" in /var/log/messages, /var/log/syslog or
/var/log/kern.log?

The second issue might be trickier. Do you always start the first node as a
master? Can you reproduce the issue by just restarting the master without
waiting for it to get restarted? If you can, it would be nice to get more
granular logs for discovery and cluster service by adding these lines to
logging.yml.

logger:
discovery=TRACE
cluster=TRACE

You config files for both nodes might be also helpful.

On Friday, April 27, 2012 3:00:39 PM UTC-4, Craig Brown wrote:

While doing some large scale indexing, I run across this problem
fairly often, especially when having indexed > 200m records. This is a
2-node cluster with 1 index and 50 shards 0 replicas. Index size is
~600GB per node with 390m records currently. Each node has 4-250GB
EBS volumes with the ES data spread across all 4 volumes. Each volume
is 62% full. Each bulk indexing request has 10k records of ~3k each.

[1]: index [documents], type [records], id [43-116530000], message
[UnavailableShardsException[[documents][21] [1] shardIt, [0] active :
Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@2c761b38]]

It looks like after this error occurs, that the master will disconnect
and restart, but elects itself as master instead of reconnecting with
the other node. I have to stop both nodes, and then restart to get the
cluster back. I have noted similar behavior with a 4-node cluster.

My basic question is: is the timeout being caused by delays in AWS,
EBS, or is it more likely a result of the large index size and how
Lucene handles disk I/O?

Secondarily, can this situation be avoided? If disk latency is too
high, can we back off/recover more gracefully?

Thanks!

Craig

MASTER NODE
[2012-04-27 15:06:59,308][INFO ][node ] [Courier]
{0.19.2}[7517]: started
[2012-04-27 15:07:12,156][INFO ][gateway ] [Courier]
recovered [1] indices into cluster_state
[2012-04-27 15:07:28,506][INFO ][cluster.service ] [Courier]
added {[Styx and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/
10.33.15.1:9300]],}, reason: zen-disco-receive(join from node[[Styx
and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/10.33.15.1:9300]]])
[2012-04-27 15:11:21,474][INFO ][cluster.service ] [Courier]
added {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(join from node[[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ]
[inet[/10.38.249.226:9301]]{client=true, data=false}])
[2012-04-27 16:12:51,368][INFO ][cluster.service ] [Courier]
removed {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
node_left([Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false})
[2012-04-27 16:15:16,046][INFO ][cluster.service ] [Courier]
added {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(join from node[[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false}])
[2012-04-27 17:09:28,431][INFO ][cluster.service ] [Courier]
removed {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
node_left([Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false})
[2012-04-27 17:11:53,985][INFO ][cluster.service ] [Courier]
added {[Glamor][TZwHuJlzRE-cDB-MLKA-tw][inet[/10.38.249.226:9301]]
{client=true, data=false},}, reason: zen-disco-receive(join from
node[[Glamor][TZwHuJlzRE-cDB-MLKA-tw][inet[/10.38.249.226:9301]]
{client=true, data=false}])
[2012-04-27 17:59:17,030][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: initializing ...
[2012-04-27 17:59:17,048][INFO ][plugins ] [Multiple
Man] loaded , sites [bigdesk]
[2012-04-27 17:59:20,068][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: initialized
[2012-04-27 17:59:20,069][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: starting ...
[2012-04-27 17:59:20,173][INFO ][transport ] [Multiple
Man] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.38.249.226:9300]}
[2012-04-27 17:59:24,694][INFO ][cluster.service ] [Multiple
Man] new_master [Multiple Man][zQvB_5w3Tee3rW-EJ6Iahw][inet[/
10.38.249.226:9300]], reason: zen-disco-join (elected_as_master)
[2012-04-27 17:59:24,727][INFO ][discovery ] [Multiple
Man] youwhodoc/zQvB_5w3Tee3rW-EJ6Iahw
[2012-04-27 17:59:24,742][INFO ][http ] [Multiple
Man] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.38.249.226:9200]}
[2012-04-27 17:59:24,742][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: started
[2012-04-27 17:59:44,560][INFO ][gateway ] [Multiple
Man] recovered [1] indices into cluster_state

[2012-04-27 15:07:05,713][INFO ][transport ] [Styx and
Stone] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.33.15.1:9300]}
[2012-04-27 15:07:28,517][INFO ][cluster.service ] [Styx and
Stone] detected_master [Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]], added {[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]],}, reason: zen-disco-receive(from master
[[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/10.38.249.226:9300]]])
[2012-04-27 15:07:28,527][INFO ][discovery ] [Styx and
Stone] youwhodoc/1p004qNdQ8iq2cL3otF1Jg
[2012-04-27 15:07:28,532][INFO ][http ] [Styx and
Stone] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.33.15.1:9200]}
[2012-04-27 15:07:28,532][INFO ][node ] [Styx and
Stone] {0.19.2}[8991]: started
[2012-04-27 15:11:21,476][INFO ][cluster.service ] [Styx and
Stone] added {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 16:12:51,386][INFO ][cluster.service ] [Styx and
Stone] removed {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 16:15:16,050][INFO ][cluster.service ] [Styx and
Stone] added {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:09:28,717][INFO ][cluster.service ] [Styx and
Stone] removed {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:11:53,989][INFO ][cluster.service ] [Styx and
Stone] added {[Glamor][TZwHuJlzRE-cDB-MLKA-tw][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:59:10,337][INFO ][discovery.zen ] [Styx and
Stone] master_left [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]], reason [transport disconnected (with verified
connect)]
[2012-04-27 17:59:10,386][INFO ][cluster.service ] [Styx and
Stone] master {new [Styx and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/
10.33.15.1:9300]], previous [Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]}, removed {[Courier][UmhafZyUQB-r3weDCPftIQ]
[inet[/10.38.249.226:9300]],}, reason: zen-disco-master_failed
([Courier][UmhafZyUQB-r3weDCPftIQ][inet[/10.38.249.226:9300]])

--
…
CRAIG BROWN
chief architect
youwho, Inc.

www.youwho.com

T: 801.855. 0921
M: 801.913. 0939

Craig_Brown · April 30, 2012, 9:38pm

Thanks Shay!
I'm using the service wrapper, but it like the service.log file is not
configured correctly.

The nodes are on AWS C1XL.
High-CPU Extra Large Instance 7 GB of memory, 20 EC2 Compute Units (8
virtual cores with 2.5 EC2 Compute Units each), 1690 GB of local
instance storage, 64-bit platform

4.5GB of ram is allocated to ES.

It looks like the system is running out of swap and the kernal is
killing the process.

So what I'm doing, along with trying to see how far I can push ES :),
is to replace a Riak cluster we have. I really like Riak and it has
performed well. The real problem is that it is too opaque. It is
difficult/impossible to even count the number of documents we have in
a bucket with a large number of documents. We need to be able to at
least count how many documents we have.

I have the schema set to index on on _id and ignore everything else.
The documents are 1-3k each, but we have a lot. I was trying to see if
I could squeeze all of the documents on 2 nodes to save some $$ before
we go live. It seems at a minimum that we should use the high-mem
nodes instead of high-compute. That selection was mostly for faster
indexing.

We're not using routing, assuming that GET via _id is automatically
routed. Is that not true.

Is putting this many documents on 2 nodes simply a really bad idea
given the purpose? Would it be better to have more shards which would
create smaller lucene indexes, or less shards and larger lucence
indexes? I'm assuming that more shards/ smaller lucene indexes are
better.

Thanks!

Craig

On Sun, Apr 29, 2012 at 11:11 AM, Shay Banon kimchy@gmail.com wrote:

To Add:

It seems like your master node got restarted at 2012-04-27 17:59:17,03. Are
you using the service wrapper, if so, check its service.log to see if it
restarted it for some reason.

50 shards index on 2 nodes seems like a bit too much (unless you are using
routing). What are the spec of the machines? How much memory have you
allocated to ES JVM

Igor_Motov · April 30, 2012, 9:40pm

It means that you are running out of memory on the OS level. In this
situation, the process that is using most of the memory (es in your case)
is getting killed. If you can reproduce it, check where memory goes. It's
quite possible that with some other processes running on the system 3.5 G
that you leave to the rest of the system is not enough.

On Monday, April 30, 2012 5:26:01 PM UTC-4, Craig Brown wrote:

Thanks for helping.
So I see some problems in var/log/messages.
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.187004]
lowmem_reserve: 0 0 3031 3031
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.200023] Node 0 Normal
free:4536kB min:4616kB low:5768kB high:6924kB active_anon:2951188kB
inactive_anon:4kB active_file:72kB inactive_file:1228kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:3104640kB mlocked:0kB dirty:1192kB writeback:0kB mapped:0kB
shmem:68kB slab_reclaimable:13832kB slab_unreclaimable:15144kB
kernel_stack:1560kB pagetables:9232kB unstable:0kB bounce:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no

Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.200164] Free swap = 0kB
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.200169] Total swap = 0kB
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 1837040 pages RAM
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 57003 pages
reserved
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 1605 pages shared
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 1768621 pages
non-shared

Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] Out of
memory: Kill process 7517 (java) score 688 or sacrifice child
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] Killed
process 7517 (java) total-vm:9783644kB, anon-rss:5104668kB,
file-rss:0kB

So it clearly looks like it starts swapping and the runs out
completely. I don't see any hs_err_pid files.

I added the log parameters. I'll have see if I can reproduce the
problem without out re-indexing all of the data again. I have to get
up over 200m records before I start seeing the problem. I'll let you
know.

The nodes are on AWS C1XL.
High-CPU Extra Large Instance 7 GB of memory, 20 EC2 Compute Units (8
virtual cores with 2.5 EC2 Compute Units each), 1690 GB of local
instance storage, 64-bit platform

I have 4.5GN for ES. From the looks of this, it would likely be better
to have high-mem instances instead of high-compute.

Craig

On Fri, Apr 27, 2012 at 9:24 PM, Igor Motov imotov@gmail.com wrote:

There are two issues here. First - node restart and second - master
discovery.

In order to figure out the node restart problem, it would be useful to
know
which version of java you are usings. After the crash do you see
any hs_err_pid*.log files lying around? Do you see any messages
containing
phrase "killed process" in /var/log/messages, /var/log/syslog or
/var/log/kern.log?

The second issue might be trickier. Do you always start the first node
as a
master? Can you reproduce the issue by just restarting the master
without
waiting for it to get restarted? If you can, it would be nice to get
more
granular logs for discovery and cluster service by adding these lines to
logging.yml.

logger:
discovery=TRACE
cluster=TRACE

You config files for both nodes might be also helpful.

On Friday, April 27, 2012 3:00:39 PM UTC-4, Craig Brown wrote:

While doing some large scale indexing, I run across this problem
fairly often, especially when having indexed > 200m records. This is a
2-node cluster with 1 index and 50 shards 0 replicas. Index size is
~600GB per node with 390m records currently. Each node has 4-250GB
EBS volumes with the ES data spread across all 4 volumes. Each volume
is 62% full. Each bulk indexing request has 10k records of ~3k each.

[1]: index [documents], type [records], id [43-116530000], message
[UnavailableShardsException[[documents][21] [1] shardIt, [0] active :
Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@2c761b38]]

It looks like after this error occurs, that the master will disconnect
and restart, but elects itself as master instead of reconnecting with
the other node. I have to stop both nodes, and then restart to get the
cluster back. I have noted similar behavior with a 4-node cluster.

My basic question is: is the timeout being caused by delays in AWS,
EBS, or is it more likely a result of the large index size and how
Lucene handles disk I/O?

Secondarily, can this situation be avoided? If disk latency is too
high, can we back off/recover more gracefully?

Thanks!

Craig

MASTER NODE
[2012-04-27 15:06:59,308][INFO ][node ] [Courier]
{0.19.2}[7517]: started
[2012-04-27 15:07:12,156][INFO ][gateway ] [Courier]
recovered [1] indices into cluster_state
[2012-04-27 15:07:28,506][INFO ][cluster.service ] [Courier]
added {[Styx and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/
10.33.15.1:9300]],}, reason: zen-disco-receive(join from node[[Styx
and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/10.33.15.1:9300]]])
[2012-04-27 15:11:21,474][INFO ][cluster.service ] [Courier]
added {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(join from node[[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ]
[inet[/10.38.249.226:9301]]{client=true, data=false}])
[2012-04-27 16:12:51,368][INFO ][cluster.service ] [Courier]
removed {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
node_left([Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false})
[2012-04-27 16:15:16,046][INFO ][cluster.service ] [Courier]
added {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(join from node[[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false}])
[2012-04-27 17:09:28,431][INFO ][cluster.service ] [Courier]
removed {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
node_left([Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false})
[2012-04-27 17:11:53,985][INFO ][cluster.service ] [Courier]
added {[Glamor][TZwHuJlzRE-cDB-MLKA-tw][inet[/10.38.249.226:9301]]
{client=true, data=false},}, reason: zen-disco-receive(join from
node[[Glamor][TZwHuJlzRE-cDB-MLKA-tw][inet[/10.38.249.226:9301]]
{client=true, data=false}])
[2012-04-27 17:59:17,030][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: initializing ...
[2012-04-27 17:59:17,048][INFO ][plugins ] [Multiple
Man] loaded , sites [bigdesk]
[2012-04-27 17:59:20,068][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: initialized
[2012-04-27 17:59:20,069][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: starting ...
[2012-04-27 17:59:20,173][INFO ][transport ] [Multiple
Man] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.38.249.226:9300]}
[2012-04-27 17:59:24,694][INFO ][cluster.service ] [Multiple
Man] new_master [Multiple Man][zQvB_5w3Tee3rW-EJ6Iahw][inet[/
10.38.249.226:9300]], reason: zen-disco-join (elected_as_master)
[2012-04-27 17:59:24,727][INFO ][discovery ] [Multiple
Man] youwhodoc/zQvB_5w3Tee3rW-EJ6Iahw
[2012-04-27 17:59:24,742][INFO ][http ] [Multiple
Man] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.38.249.226:9200]}
[2012-04-27 17:59:24,742][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: started
[2012-04-27 17:59:44,560][INFO ][gateway ] [Multiple
Man] recovered [1] indices into cluster_state

[2012-04-27 15:07:05,713][INFO ][transport ] [Styx and
Stone] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.33.15.1:9300]}
[2012-04-27 15:07:28,517][INFO ][cluster.service ] [Styx and
Stone] detected_master [Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]], added {[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]],}, reason: zen-disco-receive(from master
[[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/10.38.249.226:9300]]])
[2012-04-27 15:07:28,527][INFO ][discovery ] [Styx and
Stone] youwhodoc/1p004qNdQ8iq2cL3otF1Jg
[2012-04-27 15:07:28,532][INFO ][http ] [Styx and
Stone] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.33.15.1:9200]}
[2012-04-27 15:07:28,532][INFO ][node ] [Styx and
Stone] {0.19.2}[8991]: started
[2012-04-27 15:11:21,476][INFO ][cluster.service ] [Styx and
Stone] added {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 16:12:51,386][INFO ][cluster.service ] [Styx and
Stone] removed {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 16:15:16,050][INFO ][cluster.service ] [Styx and
Stone] added {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:09:28,717][INFO ][cluster.service ] [Styx and
Stone] removed {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:11:53,989][INFO ][cluster.service ] [Styx and
Stone] added {[Glamor][TZwHuJlzRE-cDB-MLKA-tw][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:59:10,337][INFO ][discovery.zen ] [Styx and
Stone] master_left [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]], reason [transport disconnected (with verified
connect)]
[2012-04-27 17:59:10,386][INFO ][cluster.service ] [Styx and
Stone] master {new [Styx and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/
10.33.15.1:9300]], previous [Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]}, removed {[Courier][UmhafZyUQB-r3weDCPftIQ]
[inet[/10.38.249.226:9300]],}, reason: zen-disco-master_failed
([Courier][UmhafZyUQB-r3weDCPftIQ][inet[/10.38.249.226:9300]])

--
…
CRAIG BROWN
chief architect
youwho, Inc.

www.youwho.com

T: 801.855. 0921
M: 801.913. 0939

On Monday, April 30, 2012 5:26:01 PM UTC-4, Craig Brown wrote:

Thanks for helping.
So I see some problems in var/log/messages.
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.187004]
lowmem_reserve: 0 0 3031 3031
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.200023] Node 0 Normal
free:4536kB min:4616kB low:5768kB high:6924kB active_anon:2951188kB
inactive_anon:4kB active_file:72kB inactive_file:1228kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:3104640kB mlocked:0kB dirty:1192kB writeback:0kB mapped:0kB
shmem:68kB slab_reclaimable:13832kB slab_unreclaimable:15144kB
kernel_stack:1560kB pagetables:9232kB unstable:0kB bounce:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no

Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.200164] Free swap = 0kB
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.200169] Total swap = 0kB
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 1837040 pages RAM
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 57003 pages
reserved
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 1605 pages shared
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 1768621 pages
non-shared

Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] Out of
memory: Kill process 7517 (java) score 688 or sacrifice child
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] Killed
process 7517 (java) total-vm:9783644kB, anon-rss:5104668kB,
file-rss:0kB

So it clearly looks like it starts swapping and the runs out
completely. I don't see any hs_err_pid files.

I added the log parameters. I'll have see if I can reproduce the
problem without out re-indexing all of the data again. I have to get
up over 200m records before I start seeing the problem. I'll let you
know.

The nodes are on AWS C1XL.
High-CPU Extra Large Instance 7 GB of memory, 20 EC2 Compute Units (8
virtual cores with 2.5 EC2 Compute Units each), 1690 GB of local
instance storage, 64-bit platform

I have 4.5GN for ES. From the looks of this, it would likely be better
to have high-mem instances instead of high-compute.

Craig

On Fri, Apr 27, 2012 at 9:24 PM, Igor Motov imotov@gmail.com wrote:

There are two issues here. First - node restart and second - master
discovery.

In order to figure out the node restart problem, it would be useful to
know
which version of java you are usings. After the crash do you see
any hs_err_pid*.log files lying around? Do you see any messages
containing
phrase "killed process" in /var/log/messages, /var/log/syslog or
/var/log/kern.log?

The second issue might be trickier. Do you always start the first node
as a
master? Can you reproduce the issue by just restarting the master
without
waiting for it to get restarted? If you can, it would be nice to get
more
granular logs for discovery and cluster service by adding these lines to
logging.yml.

logger:
discovery=TRACE
cluster=TRACE

You config files for both nodes might be also helpful.

On Friday, April 27, 2012 3:00:39 PM UTC-4, Craig Brown wrote:

While doing some large scale indexing, I run across this problem
fairly often, especially when having indexed > 200m records. This is a
2-node cluster with 1 index and 50 shards 0 replicas. Index size is
~600GB per node with 390m records currently. Each node has 4-250GB
EBS volumes with the ES data spread across all 4 volumes. Each volume
is 62% full. Each bulk indexing request has 10k records of ~3k each.

[1]: index [documents], type [records], id [43-116530000], message
[UnavailableShardsException[[documents][21] [1] shardIt, [0] active :
Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@2c761b38]]

It looks like after this error occurs, that the master will disconnect
and restart, but elects itself as master instead of reconnecting with
the other node. I have to stop both nodes, and then restart to get the
cluster back. I have noted similar behavior with a 4-node cluster.

My basic question is: is the timeout being caused by delays in AWS,
EBS, or is it more likely a result of the large index size and how
Lucene handles disk I/O?

Secondarily, can this situation be avoided? If disk latency is too
high, can we back off/recover more gracefully?

Thanks!

Craig

MASTER NODE
[2012-04-27 15:06:59,308][INFO ][node ] [Courier]
{0.19.2}[7517]: started
[2012-04-27 15:07:12,156][INFO ][gateway ] [Courier]
recovered [1] indices into cluster_state
[2012-04-27 15:07:28,506][INFO ][cluster.service ] [Courier]
added {[Styx and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/
10.33.15.1:9300]],}, reason: zen-disco-receive(join from node[[Styx
and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/10.33.15.1:9300]]])
[2012-04-27 15:11:21,474][INFO ][cluster.service ] [Courier]
added {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(join from node[[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ]
[inet[/10.38.249.226:9301]]{client=true, data=false}])
[2012-04-27 16:12:51,368][INFO ][cluster.service ] [Courier]
removed {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
node_left([Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false})
[2012-04-27 16:15:16,046][INFO ][cluster.service ] [Courier]
added {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(join from node[[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false}])
[2012-04-27 17:09:28,431][INFO ][cluster.service ] [Courier]
removed {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
node_left([Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false})
[2012-04-27 17:11:53,985][INFO ][cluster.service ] [Courier]
added {[Glamor][TZwHuJlzRE-cDB-MLKA-tw][inet[/10.38.249.226:9301]]
{client=true, data=false},}, reason: zen-disco-receive(join from
node[[Glamor][TZwHuJlzRE-cDB-MLKA-tw][inet[/10.38.249.226:9301]]
{client=true, data=false}])
[2012-04-27 17:59:17,030][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: initializing ...
[2012-04-27 17:59:17,048][INFO ][plugins ] [Multiple
Man] loaded , sites [bigdesk]
[2012-04-27 17:59:20,068][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: initialized
[2012-04-27 17:59:20,069][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: starting ...
[2012-04-27 17:59:20,173][INFO ][transport ] [Multiple
Man] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.38.249.226:9300]}
[2012-04-27 17:59:24,694][INFO ][cluster.service ] [Multiple
Man] new_master [Multiple Man][zQvB_5w3Tee3rW-EJ6Iahw][inet[/
10.38.249.226:9300]], reason: zen-disco-join (elected_as_master)
[2012-04-27 17:59:24,727][INFO ][discovery ] [Multiple
Man] youwhodoc/zQvB_5w3Tee3rW-EJ6Iahw
[2012-04-27 17:59:24,742][INFO ][http ] [Multiple
Man] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.38.249.226:9200]}
[2012-04-27 17:59:24,742][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: started
[2012-04-27 17:59:44,560][INFO ][gateway ] [Multiple
Man] recovered [1] indices into cluster_state

[2012-04-27 15:07:05,713][INFO ][transport ] [Styx and
Stone] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.33.15.1:9300]}
[2012-04-27 15:07:28,517][INFO ][cluster.service ] [Styx and
Stone] detected_master [Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]], added {[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]],}, reason: zen-disco-receive(from master
[[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/10.38.249.226:9300]]])
[2012-04-27 15:07:28,527][INFO ][discovery ] [Styx and
Stone] youwhodoc/1p004qNdQ8iq2cL3otF1Jg
[2012-04-27 15:07:28,532][INFO ][http ] [Styx and
Stone] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.33.15.1:9200]}
[2012-04-27 15:07:28,532][INFO ][node ] [Styx and
Stone] {0.19.2}[8991]: started
[2012-04-27 15:11:21,476][INFO ][cluster.service ] [Styx and
Stone] added {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 16:12:51,386][INFO ][cluster.service ] [Styx and
Stone] removed {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 16:15:16,050][INFO ][cluster.service ] [Styx and
Stone] added {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:09:28,717][INFO ][cluster.service ] [Styx and
Stone] removed {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:11:53,989][INFO ][cluster.service ] [Styx and
Stone] added {[Glamor][TZwHuJlzRE-cDB-MLKA-tw][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:59:10,337][INFO ][discovery.zen ] [Styx and
Stone] master_left [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]], reason [transport disconnected (with verified
connect)]
[2012-04-27 17:59:10,386][INFO ][cluster.service ] [Styx and
Stone] master {new [Styx and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/
10.33.15.1:9300]], previous [Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]}, removed {[Courier][UmhafZyUQB-r3weDCPftIQ]
[inet[/10.38.249.226:9300]],}, reason: zen-disco-master_failed
([Courier][UmhafZyUQB-r3weDCPftIQ][inet[/10.38.249.226:9300]])

--
…
CRAIG BROWN
chief architect
youwho, Inc.

www.youwho.com

T: 801.855. 0921
M: 801.913. 0939

Craig_Brown · April 30, 2012, 9:56pm

Ya, I'm going to try a high-mem instance and see what happens. I know
better what the system is doing now, so I can watch it better.

As for the node restarting and not reconnecting, it is the master node
of the 2 that goes down and does not reconnect. I'll need to try to
reproduce it.

Craig

On Mon, Apr 30, 2012 at 3:40 PM, Igor Motov imotov@gmail.com wrote:

It means that you are running out of memory on the OS level. In this
situation, the process that is using most of the memory (es in your case) is
getting killed. If you can reproduce it, check where memory goes. It's quite
possible that with some other processes running on the system 3.5 G that you
leave to the rest of the system is not enough.

On Monday, April 30, 2012 5:26:01 PM UTC-4, Craig Brown wrote:

Thanks for helping.
So I see some problems in var/log/messages.
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.187004]
lowmem_reserve: 0 0 3031 3031
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.200023] Node 0 Normal
free:4536kB min:4616kB low:5768kB high:6924kB active_anon:2951188kB
inactive_anon:4kB active_file:72kB inactive_file:1228kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:3104640kB mlocked:0kB dirty:1192kB writeback:0kB mapped:0kB
shmem:68kB slab_reclaimable:13832kB slab_unreclaimable:15144kB
kernel_stack:1560kB pagetables:9232kB unstable:0kB bounce:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no

Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.200164] Free swap = 0kB
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.200169] Total swap = 0kB
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 1837040 pages RAM
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 57003 pages
reserved
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 1605 pages shared
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 1768621 pages
non-shared

Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] Out of
memory: Kill process 7517 (java) score 688 or sacrifice child
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] Killed
process 7517 (java) total-vm:9783644kB, anon-rss:5104668kB,
file-rss:0kB

So it clearly looks like it starts swapping and the runs out
completely. I don't see any hs_err_pid files.

I added the log parameters. I'll have see if I can reproduce the
problem without out re-indexing all of the data again. I have to get
up over 200m records before I start seeing the problem. I'll let you
know.

The nodes are on AWS C1XL.
High-CPU Extra Large Instance 7 GB of memory, 20 EC2 Compute Units (8
virtual cores with 2.5 EC2 Compute Units each), 1690 GB of local
instance storage, 64-bit platform

I have 4.5GN for ES. From the looks of this, it would likely be better
to have high-mem instances instead of high-compute.

Craig

On Fri, Apr 27, 2012 at 9:24 PM, Igor Motov imotov@gmail.com wrote:

There are two issues here. First - node restart and second - master
discovery.

In order to figure out the node restart problem, it would be useful to
know
which version of java you are usings. After the crash do you see
any hs_err_pid*.log files lying around? Do you see any messages
containing
phrase "killed process" in /var/log/messages, /var/log/syslog or
/var/log/kern.log?

The second issue might be trickier. Do you always start the first node
as a
master? Can you reproduce the issue by just restarting the master
without
waiting for it to get restarted? If you can, it would be nice to get
more
granular logs for discovery and cluster service by adding these lines to
logging.yml.

logger:
discovery=TRACE
cluster=TRACE

You config files for both nodes might be also helpful.

On Friday, April 27, 2012 3:00:39 PM UTC-4, Craig Brown wrote:

While doing some large scale indexing, I run across this problem
fairly often, especially when having indexed > 200m records. This is a
2-node cluster with 1 index and 50 shards 0 replicas. Index size is
~600GB per node with 390m records currently. Each node has 4-250GB
EBS volumes with the ES data spread across all 4 volumes. Each volume
is 62% full. Each bulk indexing request has 10k records of ~3k each.

[1]: index [documents], type [records], id [43-116530000], message
[UnavailableShardsException[[documents][21] [1] shardIt, [0] active :
Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@2c761b38]]

It looks like after this error occurs, that the master will disconnect
and restart, but elects itself as master instead of reconnecting with
the other node. I have to stop both nodes, and then restart to get the
cluster back. I have noted similar behavior with a 4-node cluster.

My basic question is: is the timeout being caused by delays in AWS,
EBS, or is it more likely a result of the large index size and how
Lucene handles disk I/O?

Secondarily, can this situation be avoided? If disk latency is too
high, can we back off/recover more gracefully?

Thanks!

Craig

MASTER NODE
[2012-04-27 15:06:59,308][INFO ][node ] [Courier]
{0.19.2}[7517]: started
[2012-04-27 15:07:12,156][INFO ][gateway ] [Courier]
recovered [1] indices into cluster_state
[2012-04-27 15:07:28,506][INFO ][cluster.service ] [Courier]
added {[Styx and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/
10.33.15.1:9300]],}, reason: zen-disco-receive(join from node[[Styx
and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/10.33.15.1:9300]]])
[2012-04-27 15:11:21,474][INFO ][cluster.service ] [Courier]
added {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(join from node[[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ]
[inet[/10.38.249.226:9301]]{client=true, data=false}])
[2012-04-27 16:12:51,368][INFO ][cluster.service ] [Courier]
removed {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
node_left([Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false})
[2012-04-27 16:15:16,046][INFO ][cluster.service ] [Courier]
added {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(join from node[[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false}])
[2012-04-27 17:09:28,431][INFO ][cluster.service ] [Courier]
removed {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
node_left([Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false})
[2012-04-27 17:11:53,985][INFO ][cluster.service ] [Courier]
added {[Glamor][TZwHuJlzRE-cDB-MLKA-tw][inet[/10.38.249.226:9301]]
{client=true, data=false},}, reason: zen-disco-receive(join from
node[[Glamor][TZwHuJlzRE-cDB-MLKA-tw][inet[/10.38.249.226:9301]]
{client=true, data=false}])
[2012-04-27 17:59:17,030][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: initializing ...
[2012-04-27 17:59:17,048][INFO ][plugins ] [Multiple
Man] loaded , sites [bigdesk]
[2012-04-27 17:59:20,068][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: initialized
[2012-04-27 17:59:20,069][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: starting ...
[2012-04-27 17:59:20,173][INFO ][transport ] [Multiple
Man] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.38.249.226:9300]}
[2012-04-27 17:59:24,694][INFO ][cluster.service ] [Multiple
Man] new_master [Multiple Man][zQvB_5w3Tee3rW-EJ6Iahw][inet[/
10.38.249.226:9300]], reason: zen-disco-join (elected_as_master)
[2012-04-27 17:59:24,727][INFO ][discovery ] [Multiple
Man] youwhodoc/zQvB_5w3Tee3rW-EJ6Iahw
[2012-04-27 17:59:24,742][INFO ][http ] [Multiple
Man] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.38.249.226:9200]}
[2012-04-27 17:59:24,742][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: started
[2012-04-27 17:59:44,560][INFO ][gateway ] [Multiple
Man] recovered [1] indices into cluster_state

[2012-04-27 15:07:05,713][INFO ][transport ] [Styx and
Stone] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.33.15.1:9300]}
[2012-04-27 15:07:28,517][INFO ][cluster.service ] [Styx and
Stone] detected_master [Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]], added {[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]],}, reason: zen-disco-receive(from master
[[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/10.38.249.226:9300]]])
[2012-04-27 15:07:28,527][INFO ][discovery ] [Styx and
Stone] youwhodoc/1p004qNdQ8iq2cL3otF1Jg
[2012-04-27 15:07:28,532][INFO ][http ] [Styx and
Stone] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.33.15.1:9200]}
[2012-04-27 15:07:28,532][INFO ][node ] [Styx and
Stone] {0.19.2}[8991]: started
[2012-04-27 15:11:21,476][INFO ][cluster.service ] [Styx and
Stone] added {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 16:12:51,386][INFO ][cluster.service ] [Styx and
Stone] removed {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 16:15:16,050][INFO ][cluster.service ] [Styx and
Stone] added {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:09:28,717][INFO ][cluster.service ] [Styx and
Stone] removed {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:11:53,989][INFO ][cluster.service ] [Styx and
Stone] added {[Glamor][TZwHuJlzRE-cDB-MLKA-tw][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:59:10,337][INFO ][discovery.zen ] [Styx and
Stone] master_left [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]], reason [transport disconnected (with verified
connect)]
[2012-04-27 17:59:10,386][INFO ][cluster.service ] [Styx and
Stone] master {new [Styx and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/
10.33.15.1:9300]], previous [Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]}, removed {[Courier][UmhafZyUQB-r3weDCPftIQ]
[inet[/10.38.249.226:9300]],}, reason: zen-disco-master_failed
([Courier][UmhafZyUQB-r3weDCPftIQ][inet[/10.38.249.226:9300]])

--
…
CRAIG BROWN
chief architect
youwho, Inc.

www.youwho.com

T: 801.855. 0921
M: 801.913. 0939

On Monday, April 30, 2012 5:26:01 PM UTC-4, Craig Brown wrote:

Thanks for helping.
So I see some problems in var/log/messages.
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.187004]
lowmem_reserve: 0 0 3031 3031
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.200023] Node 0 Normal
free:4536kB min:4616kB low:5768kB high:6924kB active_anon:2951188kB
inactive_anon:4kB active_file:72kB inactive_file:1228kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:3104640kB mlocked:0kB dirty:1192kB writeback:0kB mapped:0kB
shmem:68kB slab_reclaimable:13832kB slab_unreclaimable:15144kB
kernel_stack:1560kB pagetables:9232kB unstable:0kB bounce:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no

Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.200164] Free swap = 0kB
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.200169] Total swap = 0kB
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 1837040 pages RAM
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 57003 pages
reserved
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 1605 pages shared
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 1768621 pages
non-shared

Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] Out of
memory: Kill process 7517 (java) score 688 or sacrifice child
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] Killed
process 7517 (java) total-vm:9783644kB, anon-rss:5104668kB,
file-rss:0kB

So it clearly looks like it starts swapping and the runs out
completely. I don't see any hs_err_pid files.

I added the log parameters. I'll have see if I can reproduce the
problem without out re-indexing all of the data again. I have to get
up over 200m records before I start seeing the problem. I'll let you
know.

The nodes are on AWS C1XL.
High-CPU Extra Large Instance 7 GB of memory, 20 EC2 Compute Units (8
virtual cores with 2.5 EC2 Compute Units each), 1690 GB of local
instance storage, 64-bit platform

I have 4.5GN for ES. From the looks of this, it would likely be better
to have high-mem instances instead of high-compute.

Craig

On Fri, Apr 27, 2012 at 9:24 PM, Igor Motov imotov@gmail.com wrote:

There are two issues here. First - node restart and second - master
discovery.

In order to figure out the node restart problem, it would be useful to
know
which version of java you are usings. After the crash do you see
any hs_err_pid*.log files lying around? Do you see any messages
containing
phrase "killed process" in /var/log/messages, /var/log/syslog or
/var/log/kern.log?

The second issue might be trickier. Do you always start the first node
as a
master? Can you reproduce the issue by just restarting the master
without
waiting for it to get restarted? If you can, it would be nice to get
more
granular logs for discovery and cluster service by adding these lines to
logging.yml.

logger:
discovery=TRACE
cluster=TRACE

You config files for both nodes might be also helpful.

On Friday, April 27, 2012 3:00:39 PM UTC-4, Craig Brown wrote:

While doing some large scale indexing, I run across this problem
fairly often, especially when having indexed > 200m records. This is a
2-node cluster with 1 index and 50 shards 0 replicas. Index size is
~600GB per node with 390m records currently. Each node has 4-250GB
EBS volumes with the ES data spread across all 4 volumes. Each volume
is 62% full. Each bulk indexing request has 10k records of ~3k each.

[1]: index [documents], type [records], id [43-116530000], message
[UnavailableShardsException[[documents][21] [1] shardIt, [0] active :
Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@2c761b38]]

It looks like after this error occurs, that the master will disconnect
and restart, but elects itself as master instead of reconnecting with
the other node. I have to stop both nodes, and then restart to get the
cluster back. I have noted similar behavior with a 4-node cluster.

My basic question is: is the timeout being caused by delays in AWS,
EBS, or is it more likely a result of the large index size and how
Lucene handles disk I/O?

Secondarily, can this situation be avoided? If disk latency is too
high, can we back off/recover more gracefully?

Thanks!

Craig

MASTER NODE
[2012-04-27 15:06:59,308][INFO ][node ] [Courier]
{0.19.2}[7517]: started
[2012-04-27 15:07:12,156][INFO ][gateway ] [Courier]
recovered [1] indices into cluster_state
[2012-04-27 15:07:28,506][INFO ][cluster.service ] [Courier]
added {[Styx and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/
10.33.15.1:9300]],}, reason: zen-disco-receive(join from node[[Styx
and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/10.33.15.1:9300]]])
[2012-04-27 15:11:21,474][INFO ][cluster.service ] [Courier]
added {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(join from node[[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ]
[inet[/10.38.249.226:9301]]{client=true, data=false}])
[2012-04-27 16:12:51,368][INFO ][cluster.service ] [Courier]
removed {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
node_left([Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false})
[2012-04-27 16:15:16,046][INFO ][cluster.service ] [Courier]
added {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(join from node[[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false}])
[2012-04-27 17:09:28,431][INFO ][cluster.service ] [Courier]
removed {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
node_left([Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false})
[2012-04-27 17:11:53,985][INFO ][cluster.service ] [Courier]
added {[Glamor][TZwHuJlzRE-cDB-MLKA-tw][inet[/10.38.249.226:9301]]
{client=true, data=false},}, reason: zen-disco-receive(join from
node[[Glamor][TZwHuJlzRE-cDB-MLKA-tw][inet[/10.38.249.226:9301]]
{client=true, data=false}])
[2012-04-27 17:59:17,030][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: initializing ...
[2012-04-27 17:59:17,048][INFO ][plugins ] [Multiple
Man] loaded , sites [bigdesk]
[2012-04-27 17:59:20,068][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: initialized
[2012-04-27 17:59:20,069][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: starting ...
[2012-04-27 17:59:20,173][INFO ][transport ] [Multiple
Man] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.38.249.226:9300]}
[2012-04-27 17:59:24,694][INFO ][cluster.service ] [Multiple
Man] new_master [Multiple Man][zQvB_5w3Tee3rW-EJ6Iahw][inet[/
10.38.249.226:9300]], reason: zen-disco-join (elected_as_master)
[2012-04-27 17:59:24,727][INFO ][discovery ] [Multiple
Man] youwhodoc/zQvB_5w3Tee3rW-EJ6Iahw
[2012-04-27 17:59:24,742][INFO ][http ] [Multiple
Man] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.38.249.226:9200]}
[2012-04-27 17:59:24,742][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: started
[2012-04-27 17:59:44,560][INFO ][gateway ] [Multiple
Man] recovered [1] indices into cluster_state

[2012-04-27 15:07:05,713][INFO ][transport ] [Styx and
Stone] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.33.15.1:9300]}
[2012-04-27 15:07:28,517][INFO ][cluster.service ] [Styx and
Stone] detected_master [Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]], added {[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]],}, reason: zen-disco-receive(from master
[[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/10.38.249.226:9300]]])
[2012-04-27 15:07:28,527][INFO ][discovery ] [Styx and
Stone] youwhodoc/1p004qNdQ8iq2cL3otF1Jg
[2012-04-27 15:07:28,532][INFO ][http ] [Styx and
Stone] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.33.15.1:9200]}
[2012-04-27 15:07:28,532][INFO ][node ] [Styx and
Stone] {0.19.2}[8991]: started
[2012-04-27 15:11:21,476][INFO ][cluster.service ] [Styx and
Stone] added {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 16:12:51,386][INFO ][cluster.service ] [Styx and
Stone] removed {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 16:15:16,050][INFO ][cluster.service ] [Styx and
Stone] added {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:09:28,717][INFO ][cluster.service ] [Styx and
Stone] removed {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:11:53,989][INFO ][cluster.service ] [Styx and
Stone] added {[Glamor][TZwHuJlzRE-cDB-MLKA-tw][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:59:10,337][INFO ][discovery.zen ] [Styx and
Stone] master_left [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]], reason [transport disconnected (with verified
connect)]
[2012-04-27 17:59:10,386][INFO ][cluster.service ] [Styx and
Stone] master {new [Styx and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/
10.33.15.1:9300]], previous [Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]}, removed {[Courier][UmhafZyUQB-r3weDCPftIQ]
[inet[/10.38.249.226:9300]],}, reason: zen-disco-master_failed
([Courier][UmhafZyUQB-r3weDCPftIQ][inet[/10.38.249.226:9300]])

--
…
CRAIG BROWN
chief architect
youwho, Inc.

www.youwho.com

T: 801.855. 0921
M: 801.913. 0939

--
…
CRAIG BROWN
chief architect
youwho, Inc.

www.youwho.com

T: 801.855. 0921
M: 801.913. 0939

Craig_Brown · May 1, 2012, 6:05pm

I found the clustering problem. I'm using unicast and was missing a
digit on the address for the second node. It worked fine if I started
node 1 first, but not second. Sorry for the trouble on that.

Craig

On Mon, Apr 30, 2012 at 3:56 PM, Craig Brown cbrown@youwho.com wrote:

Ya, I'm going to try a high-mem instance and see what happens. I know
better what the system is doing now, so I can watch it better.

As for the node restarting and not reconnecting, it is the master node
of the 2 that goes down and does not reconnect. I'll need to try to
reproduce it.

Craig

On Mon, Apr 30, 2012 at 3:40 PM, Igor Motov imotov@gmail.com wrote:

It means that you are running out of memory on the OS level. In this
situation, the process that is using most of the memory (es in your case) is
getting killed. If you can reproduce it, check where memory goes. It's quite
possible that with some other processes running on the system 3.5 G that you
leave to the rest of the system is not enough.

On Monday, April 30, 2012 5:26:01 PM UTC-4, Craig Brown wrote:

Thanks for helping.
So I see some problems in var/log/messages.
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.187004]
lowmem_reserve: 0 0 3031 3031
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.200023] Node 0 Normal
free:4536kB min:4616kB low:5768kB high:6924kB active_anon:2951188kB
inactive_anon:4kB active_file:72kB inactive_file:1228kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:3104640kB mlocked:0kB dirty:1192kB writeback:0kB mapped:0kB
shmem:68kB slab_reclaimable:13832kB slab_unreclaimable:15144kB
kernel_stack:1560kB pagetables:9232kB unstable:0kB bounce:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no

Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.200164] Free swap = 0kB
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.200169] Total swap = 0kB
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 1837040 pages RAM
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 57003 pages
reserved
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 1605 pages shared
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 1768621 pages
non-shared

Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] Out of
memory: Kill process 7517 (java) score 688 or sacrifice child
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] Killed
process 7517 (java) total-vm:9783644kB, anon-rss:5104668kB,
file-rss:0kB

So it clearly looks like it starts swapping and the runs out
completely. I don't see any hs_err_pid files.

I added the log parameters. I'll have see if I can reproduce the
problem without out re-indexing all of the data again. I have to get
up over 200m records before I start seeing the problem. I'll let you
know.

The nodes are on AWS C1XL.
High-CPU Extra Large Instance 7 GB of memory, 20 EC2 Compute Units (8
virtual cores with 2.5 EC2 Compute Units each), 1690 GB of local
instance storage, 64-bit platform

I have 4.5GN for ES. From the looks of this, it would likely be better
to have high-mem instances instead of high-compute.

Craig

On Fri, Apr 27, 2012 at 9:24 PM, Igor Motov imotov@gmail.com wrote:

There are two issues here. First - node restart and second - master
discovery.

In order to figure out the node restart problem, it would be useful to
know
which version of java you are usings. After the crash do you see
any hs_err_pid*.log files lying around? Do you see any messages
containing
phrase "killed process" in /var/log/messages, /var/log/syslog or
/var/log/kern.log?

The second issue might be trickier. Do you always start the first node
as a
master? Can you reproduce the issue by just restarting the master
without
waiting for it to get restarted? If you can, it would be nice to get
more
granular logs for discovery and cluster service by adding these lines to
logging.yml.

logger:
discovery=TRACE
cluster=TRACE

You config files for both nodes might be also helpful.

On Friday, April 27, 2012 3:00:39 PM UTC-4, Craig Brown wrote:

While doing some large scale indexing, I run across this problem
fairly often, especially when having indexed > 200m records. This is a
2-node cluster with 1 index and 50 shards 0 replicas. Index size is
~600GB per node with 390m records currently. Each node has 4-250GB
EBS volumes with the ES data spread across all 4 volumes. Each volume
is 62% full. Each bulk indexing request has 10k records of ~3k each.

[1]: index [documents], type [records], id [43-116530000], message
[UnavailableShardsException[[documents][21] [1] shardIt, [0] active :
Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@2c761b38]]

It looks like after this error occurs, that the master will disconnect
and restart, but elects itself as master instead of reconnecting with
the other node. I have to stop both nodes, and then restart to get the
cluster back. I have noted similar behavior with a 4-node cluster.

My basic question is: is the timeout being caused by delays in AWS,
EBS, or is it more likely a result of the large index size and how
Lucene handles disk I/O?

Secondarily, can this situation be avoided? If disk latency is too
high, can we back off/recover more gracefully?

Thanks!

Craig

MASTER NODE
[2012-04-27 15:06:59,308][INFO ][node ] [Courier]
{0.19.2}[7517]: started
[2012-04-27 15:07:12,156][INFO ][gateway ] [Courier]
recovered [1] indices into cluster_state
[2012-04-27 15:07:28,506][INFO ][cluster.service ] [Courier]
added {[Styx and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/
10.33.15.1:9300]],}, reason: zen-disco-receive(join from node[[Styx
and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/10.33.15.1:9300]]])
[2012-04-27 15:11:21,474][INFO ][cluster.service ] [Courier]
added {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(join from node[[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ]
[inet[/10.38.249.226:9301]]{client=true, data=false}])
[2012-04-27 16:12:51,368][INFO ][cluster.service ] [Courier]
removed {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
node_left([Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false})
[2012-04-27 16:15:16,046][INFO ][cluster.service ] [Courier]
added {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(join from node[[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false}])
[2012-04-27 17:09:28,431][INFO ][cluster.service ] [Courier]
removed {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
node_left([Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false})
[2012-04-27 17:11:53,985][INFO ][cluster.service ] [Courier]
added {[Glamor][TZwHuJlzRE-cDB-MLKA-tw][inet[/10.38.249.226:9301]]
{client=true, data=false},}, reason: zen-disco-receive(join from
node[[Glamor][TZwHuJlzRE-cDB-MLKA-tw][inet[/10.38.249.226:9301]]
{client=true, data=false}])
[2012-04-27 17:59:17,030][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: initializing ...
[2012-04-27 17:59:17,048][INFO ][plugins ] [Multiple
Man] loaded , sites [bigdesk]
[2012-04-27 17:59:20,068][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: initialized
[2012-04-27 17:59:20,069][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: starting ...
[2012-04-27 17:59:20,173][INFO ][transport ] [Multiple
Man] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.38.249.226:9300]}
[2012-04-27 17:59:24,694][INFO ][cluster.service ] [Multiple
Man] new_master [Multiple Man][zQvB_5w3Tee3rW-EJ6Iahw][inet[/
10.38.249.226:9300]], reason: zen-disco-join (elected_as_master)
[2012-04-27 17:59:24,727][INFO ][discovery ] [Multiple
Man] youwhodoc/zQvB_5w3Tee3rW-EJ6Iahw
[2012-04-27 17:59:24,742][INFO ][http ] [Multiple
Man] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.38.249.226:9200]}
[2012-04-27 17:59:24,742][INFO ][node ] [Multiple
Man] {0.19.2}[10023]: started
[2012-04-27 17:59:44,560][INFO ][gateway ] [Multiple
Man] recovered [1] indices into cluster_state

[2012-04-27 15:07:05,713][INFO ][transport ] [Styx and
Stone] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/10.33.15.1:9300]}
[2012-04-27 15:07:28,517][INFO ][cluster.service ] [Styx and
Stone] detected_master [Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]], added {[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]],}, reason: zen-disco-receive(from master
[[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/10.38.249.226:9300]]])
[2012-04-27 15:07:28,527][INFO ][discovery ] [Styx and
Stone] youwhodoc/1p004qNdQ8iq2cL3otF1Jg
[2012-04-27 15:07:28,532][INFO ][http ] [Styx and
Stone] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/10.33.15.1:9200]}
[2012-04-27 15:07:28,532][INFO ][node ] [Styx and
Stone] {0.19.2}[8991]: started
[2012-04-27 15:11:21,476][INFO ][cluster.service ] [Styx and
Stone] added {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 16:12:51,386][INFO ][cluster.service ] [Styx and
Stone] removed {[Summers, Gabriel][QzF5z1yGQ2K-kLESKRspwQ][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 16:15:16,050][INFO ][cluster.service ] [Styx and
Stone] added {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:09:28,717][INFO ][cluster.service ] [Styx and
Stone] removed {[Infant Terrible][IyaA9PpDSWKJ8H-7KjlScg][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:11:53,989][INFO ][cluster.service ] [Styx and
Stone] added {[Glamor][TZwHuJlzRE-cDB-MLKA-tw][inet[/
10.38.249.226:9301]]{client=true, data=false},}, reason: zen-disco-
receive(from master [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]])
[2012-04-27 17:59:10,337][INFO ][discovery.zen ] [Styx and
Stone] master_left [[Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]], reason [transport disconnected (with verified
connect)]
[2012-04-27 17:59:10,386][INFO ][cluster.service ] [Styx and
Stone] master {new [Styx and Stone][1p004qNdQ8iq2cL3otF1Jg][inet[/
10.33.15.1:9300]], previous [Courier][UmhafZyUQB-r3weDCPftIQ][inet[/
10.38.249.226:9300]]}, removed {[Courier][UmhafZyUQB-r3weDCPftIQ]
[inet[/10.38.249.226:9300]],}, reason: zen-disco-master_failed
([Courier][UmhafZyUQB-r3weDCPftIQ][inet[/10.38.249.226:9300]])

--
…
CRAIG BROWN
chief architect
youwho, Inc.

www.youwho.com

T: 801.855. 0921
M: 801.913. 0939

On Monday, April 30, 2012 5:26:01 PM UTC-4, Craig Brown wrote:

Thanks for helping.
So I see some problems in var/log/messages.
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.187004]
lowmem_reserve: 0 0 3031 3031
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.200023] Node 0 Normal
free:4536kB min:4616kB low:5768kB high:6924kB active_anon:2951188kB
inactive_anon:4kB active_file:72kB inactive_file:1228kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:3104640kB mlocked:0kB dirty:1192kB writeback:0kB mapped:0kB
shmem:68kB slab_reclaimable:13832kB slab_unreclaimable:15144kB
kernel_stack:1560kB pagetables:9232kB unstable:0kB bounce:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no

Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.200164] Free swap = 0kB
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.200169] Total swap = 0kB
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 1837040 pages RAM
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 57003 pages
reserved
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 1605 pages shared
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] 1768621 pages
non-shared

Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] Out of
memory: Kill process 7517 (java) score 688 or sacrifice child
Apr 27 17:59:09 ip-10-38-249-226 kernel: [959759.201013] Killed
process 7517 (java) total-vm:9783644kB, anon-rss:5104668kB,
file-rss:0kB

So it clearly looks like it starts swapping and the runs out
completely. I don't see any hs_err_pid files.

I added the log parameters. I'll have see if I can reproduce the
problem without out re-indexing all of the data again. I have to get
up over 200m records before I start seeing the problem. I'll let you
know.

The nodes are on AWS C1XL.
High-CPU Extra Large Instance 7 GB of memory, 20 EC2 Compute Units (8
virtual cores with 2.5 EC2 Compute Units each), 1690 GB of local
instance storage, 64-bit platform

I have 4.5GN for ES. From the looks of this, it would likely be better
to have high-mem instances instead of high-compute.

Craig

On Fri, Apr 27, 2012 at 9:24 PM, Igor Motov imotov@gmail.com wrote:

There are two issues here. First - node restart and second - master
discovery.

In order to figure out the node restart problem, it would be useful to
know
which version of java you are usings. After the crash do you see
any hs_err_pid*.log files lying around? Do you see any messages
containing
phrase "killed process" in /var/log/messages, /var/log/syslog or
/var/log/kern.log?

The second issue might be trickier. Do you always start the first node
as a
master? Can you reproduce the issue by just restarting the master
without
waiting for it to get restarted? If you can, it would be nice to get
more
granular logs for discovery and cluster service by adding these lines to
logging.yml.

logger:
discovery=TRACE
cluster=TRACE

You config files for both nodes might be also helpful.

On Friday, April 27, 2012 3:00:39 PM UTC-4, Craig Brown wrote:

While doing some large scale indexing, I run across this problem
fairly often, especially when having indexed > 200m records. This is a
2-node cluster with 1 index and 50 shards 0 replicas. Index size is
~600GB per node with 390m records currently. Each node has 4-250GB
EBS volumes with the ES data spread across all 4 volumes. Each volume
is 62% full. Each bulk indexing request has 10k records of ~3k each.

[1]: index [documents], type [records], id [43-116530000], message
[UnavailableShardsException[[documents][21] [1] shardIt, [0] active :
Timeout waiting for [1m], request:
org.elasticsearch.action.bulk.BulkShardRequest@2c761b38]]

It looks like after this error occurs, that the master will disconnect
and restart, but elects itself as master instead of reconnecting with
the other node. I have to stop both nodes, and then restart to get the
cluster back. I have noted similar behavior with a 4-node cluster.

My basic question is: is the timeout being caused by delays in AWS,
EBS, or is it more likely a result of the large index size and how
Lucene handles disk I/O?

Secondarily, can this situation be avoided? If disk latency is too
high, can we back off/recover more gracefully?

Thanks!

Craig

Topic		Replies	Views
ElasticSearch with > 40 nodes, missing shards and indexing troubles Elasticsearch	11	658	July 6, 2017
Out of memory, missing shards, looks like split-brain Elasticsearch	10	850	July 6, 2017
Startup issues with ES 1.3.5 Elasticsearch	22	1010	July 6, 2017
Corruption when indexing large number of documents (4 billion+) Elasticsearch	6	934	July 6, 2017
New User -- Index Settings Reccomdendations and Suggestions Elasticsearch	8	463	July 6, 2017

Shard timeout problem on AWS

Related topics