Engine failure, message [OutOfMemoryError[unable to create new native thread]]


(T Vinod Gupta) #1

hi,
i had a stable ES cluster on aws ec2 instances till a week ago.. and i
don't know whats going on and my cluster keeps getting into a bad state
every few hours. the error says OOM but i know that that is not the reason.
the instance has enough heap space left. im running ES 0.90.6 version and
giving half the ram (8gb) to ES process. and i see these messages (the same
message kind of) in the logs on all the machines in the cluster.

[2014-02-11 03:17:39,936][WARN ][cluster.action.shard ] [Star-Dancer]
[facebook_022014][1] sending failed shard for [facebook_022014][1],
node[zO9Pc1GNSuiVMA_Kn2b3UQ], [R], s[STARTED], indexUUID
[qN3CUSfVS-m2KlgQQtOqxg], reason [engine failure, message
[OutOfMemoryError[unable to create new native thread]]]

any ideas on how to debug this or how to figure out whats causing this
would be really helpful.

thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHau4ytfEHERAsVcGFc-R8yePmw_KaN%3DJDV-cEwm9VK2VLgp-Q%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #2

Your user ran out of thread/process space. This is reported as OOM in Java.

You can check the "nproc" entry in /etc/security.d/limits.conf for maximum
settings and compare this with the process table.

The OS settings regarding threads are usually ok and should not be
modified. Check if you have modified ES default settings regarding the
thread pools, and revert this changes to the default settings. If this does
not help, you should upgrade from 0.90.6 to 0.90.11

Jörg

On Tue, Feb 11, 2014 at 6:45 AM, T Vinod Gupta tvinod@readypulse.comwrote:

hi,
i had a stable ES cluster on aws ec2 instances till a week ago.. and i
don't know whats going on and my cluster keeps getting into a bad state
every few hours. the error says OOM but i know that that is not the reason.
the instance has enough heap space left. im running ES 0.90.6 version and
giving half the ram (8gb) to ES process. and i see these messages (the same
message kind of) in the logs on all the machines in the cluster.

[2014-02-11 03:17:39,936][WARN ][cluster.action.shard ] [Star-Dancer]
[facebook_022014][1] sending failed shard for [facebook_022014][1],
node[zO9Pc1GNSuiVMA_Kn2b3UQ], [R], s[STARTED], indexUUID
[qN3CUSfVS-m2KlgQQtOqxg], reason [engine failure, message
[OutOfMemoryError[unable to create new native thread]]]

any ideas on how to debug this or how to figure out whats causing this
would be really helpful.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGnDrE-yZnyFgUbMks84KVsyB%3Dp_9UGvQ_DmUo5Diub0g%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(T Vinod Gupta) #3

thanks for your response Jörg, somehow missed replying earlier.
for some strange reason, the max threads setting was reset when i did a
reboot.. so i had to set it back to a high number.

On Tue, Feb 11, 2014 at 12:10 AM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

Your user ran out of thread/process space. This is reported as OOM in
Java.

You can check the "nproc" entry in /etc/security.d/limits.conf for maximum
settings and compare this with the process table.

The OS settings regarding threads are usually ok and should not be
modified. Check if you have modified ES default settings regarding the
thread pools, and revert this changes to the default settings. If this does
not help, you should upgrade from 0.90.6 to 0.90.11

Jörg

On Tue, Feb 11, 2014 at 6:45 AM, T Vinod Gupta tvinod@readypulse.comwrote:

hi,
i had a stable ES cluster on aws ec2 instances till a week ago.. and i
don't know whats going on and my cluster keeps getting into a bad state
every few hours. the error says OOM but i know that that is not the reason.
the instance has enough heap space left. im running ES 0.90.6 version and
giving half the ram (8gb) to ES process. and i see these messages (the same
message kind of) in the logs on all the machines in the cluster.

[2014-02-11 03:17:39,936][WARN ][cluster.action.shard ] [Star-Dancer]
[facebook_022014][1] sending failed shard for [facebook_022014][1],
node[zO9Pc1GNSuiVMA_Kn2b3UQ], [R], s[STARTED], indexUUID
[qN3CUSfVS-m2KlgQQtOqxg], reason [engine failure, message
[OutOfMemoryError[unable to create new native thread]]]

any ideas on how to debug this or how to figure out whats causing this
would be really helpful.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGnDrE-yZnyFgUbMks84KVsyB%3Dp_9UGvQ_DmUo5Diub0g%40mail.gmail.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHau4yuipzRgx-YmzEzjVJmyuE%3DXkTTZm08pv0-Yp5Lv5Xc97A%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4