hi,
i had a stable ES cluster on aws ec2 instances till a week ago.. and i
don't know whats going on and my cluster keeps getting into a bad state
every few hours. the error says OOM but i know that that is not the reason.
the instance has enough heap space left. im running ES 0.90.6 version and
giving half the ram (8gb) to ES process. and i see these messages (the same
message kind of) in the logs on all the machines in the cluster.
[2014-02-11 03:17:39,936][WARN ][cluster.action.shard ] [Star-Dancer]
[facebook_022014][1] sending failed shard for [facebook_022014][1],
node[zO9Pc1GNSuiVMA_Kn2b3UQ], [R], s[STARTED], indexUUID
[qN3CUSfVS-m2KlgQQtOqxg], reason [engine failure, message
[OutOfMemoryError[unable to create new native thread]]]
any ideas on how to debug this or how to figure out whats causing this
would be really helpful.
Your user ran out of thread/process space. This is reported as OOM in Java.
You can check the "nproc" entry in /etc/security.d/limits.conf for maximum
settings and compare this with the process table.
The OS settings regarding threads are usually ok and should not be
modified. Check if you have modified ES default settings regarding the
thread pools, and revert this changes to the default settings. If this does
not help, you should upgrade from 0.90.6 to 0.90.11
hi,
i had a stable ES cluster on aws ec2 instances till a week ago.. and i
don't know whats going on and my cluster keeps getting into a bad state
every few hours. the error says OOM but i know that that is not the reason.
the instance has enough heap space left. im running ES 0.90.6 version and
giving half the ram (8gb) to ES process. and i see these messages (the same
message kind of) in the logs on all the machines in the cluster.
[2014-02-11 03:17:39,936][WARN ][cluster.action.shard ] [Star-Dancer]
[facebook_022014][1] sending failed shard for [facebook_022014][1],
node[zO9Pc1GNSuiVMA_Kn2b3UQ], [R], s[STARTED], indexUUID
[qN3CUSfVS-m2KlgQQtOqxg], reason [engine failure, message
[OutOfMemoryError[unable to create new native thread]]]
any ideas on how to debug this or how to figure out whats causing this
would be really helpful.
thanks for your response Jörg, somehow missed replying earlier.
for some strange reason, the max threads setting was reset when i did a
reboot.. so i had to set it back to a high number.
Your user ran out of thread/process space. This is reported as OOM in
Java.
You can check the "nproc" entry in /etc/security.d/limits.conf for maximum
settings and compare this with the process table.
The OS settings regarding threads are usually ok and should not be
modified. Check if you have modified ES default settings regarding the
thread pools, and revert this changes to the default settings. If this does
not help, you should upgrade from 0.90.6 to 0.90.11
hi,
i had a stable ES cluster on aws ec2 instances till a week ago.. and i
don't know whats going on and my cluster keeps getting into a bad state
every few hours. the error says OOM but i know that that is not the reason.
the instance has enough heap space left. im running ES 0.90.6 version and
giving half the ram (8gb) to ES process. and i see these messages (the same
message kind of) in the logs on all the machines in the cluster.
[2014-02-11 03:17:39,936][WARN ][cluster.action.shard ] [Star-Dancer]
[facebook_022014][1] sending failed shard for [facebook_022014][1],
node[zO9Pc1GNSuiVMA_Kn2b3UQ], [R], s[STARTED], indexUUID
[qN3CUSfVS-m2KlgQQtOqxg], reason [engine failure, message
[OutOfMemoryError[unable to create new native thread]]]
any ideas on how to debug this or how to figure out whats causing this
would be really helpful.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.