Single node takes down entire cluster

moj0rising · June 6, 2013, 5:35pm

Hello,

We have a problem where one of our Elastic search nodes runs out of memory
and crashes. In the process, the whole (two-node) cluster stops
functioning. What can we do to make it so if one node fails, it does not
affect the other? All shards are replicated on both nodes.

I realize we are probably cutting it too close with regard to memory, which
is most likely why the first node goes down -- the index is 8.1 GB and each
node has 8 GB RAM. Are there any specific memory requirements for Elastic
Search? I've been searching quite a bit and have not been able to find
anything on system requirements for the application. I presume having at
least as much memory as the index is large would be a good place to start
but it would probably be best to know what is proper practise.

Thanks for any help you can provide!

Mike

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

spinscale · June 7, 2013, 7:29am

Hey,

the memory usage of elasticsearch highly depends on the usage pattern of it
(hence it is hard putting up a sheet telling that it works that fast with x
CPUs and Y GB of RAM) ranging from indexing (document size, documents per
second) to different queries (like simple filters vs. geo queries). With 8
GB of RAM you should give elasticsearch 4GB of heap and make sure no other
services (like a database for example) run on it. The other half of the
memory will be used by filesystem caches, so you have less disk I/O.

A reason why your cluster goes out of memory might be (wildly speculating
here), that you are facetting and sorting a lot on high cardinality fields
(fields that have lots and lots of values). This might be user defined tags
(for example in a del.icio.us like bookmark service) or maybe you are
trying to facet on an analyzed field. You will have to examine your queries
a bit (and you should start monitoring your system in order to know when it
is short of memory). You could also customize your facet queries a bit
using regexes or frequency in order to not need so much memory. See
Elasticsearch Platform — Find real-time answers at scale | Elastic

Also, if your first node goes down with OOM and you fire the same query to
the second node, it is no wonder, it is going down as well.
Anlyzing the GC logs (check the GC logging section in the config file) and
the slow query and slow index logs might help as well, along with
monitoring the nodes stats.

Hope this helps a bit. You should invest a bit more time in finding out,
where the OOMs come from. Maybe they can be prevented without more hardware
pretty easily.

--Alex

On Thu, Jun 6, 2013 at 7:35 PM, mojorising00@gmail.com wrote:

Hello,

We have a problem where one of our Elastic search nodes runs out of memory
and crashes. In the process, the whole (two-node) cluster stops
functioning. What can we do to make it so if one node fails, it does not
affect the other? All shards are replicated on both nodes.

I realize we are probably cutting it too close with regard to memory,
which is most likely why the first node goes down -- the index is 8.1 GB
and each node has 8 GB RAM. Are there any specific memory requirements for
Elastic Search? I've been searching quite a bit and have not been able to
find anything on system requirements for the application. I presume having
at least as much memory as the index is large would be a good place to
start but it would probably be best to know what is proper practise.

Thanks for any help you can provide!

Mike

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

moj0rising · June 7, 2013, 7:08pm

Thanks for your reply, Alex!

The information you provided on heap size is crucial. I'll adjust that in
staging today and see what happens. I know it's hard to provide precision
with regard to system requirements / recommendations but that's true of
pretty much every software application in existence. I really wish the
Elasticsearch site gave a lot more information there on where to start and
general sane reccommendations for a couple basic given scenarios (an index
size of X, Y number of queries, etc). It's understandable there would be
some tuning and refinement for each individual use case but we need a
better guide on how to approach the matter. When we get our set-up dialled
in, I'll be happy to post some details for others' reference.

We'll definitely examine our queries to see if we can make some
optimizations there.

On the cluster crashing, I wasn't very clear: The other node doesn't appear
to be crashing due to memory errors (oddly). We haven't found many clues
(in log files, Bigdesk, etc) at all into why the second node goes down --
there are no memory errors thrown by the second node. Can you think of any
other reasons this might happen.. or maybe some good places to check.

Thanks again,
Mike

On Friday, 7 June 2013 00:29:52 UTC-7, Alexander Reelsen wrote:

Hey,

the memory usage of elasticsearch highly depends on the usage pattern of
it (hence it is hard putting up a sheet telling that it works that fast
with x CPUs and Y GB of RAM) ranging from indexing (document size,
documents per second) to different queries (like simple filters vs. geo
queries). With 8 GB of RAM you should give elasticsearch 4GB of heap and
make sure no other services (like a database for example) run on it. The
other half of the memory will be used by filesystem caches, so you have
less disk I/O.

A reason why your cluster goes out of memory might be (wildly speculating
here), that you are facetting and sorting a lot on high cardinality fields
(fields that have lots and lots of values). This might be user defined tags
(for example in a del.icio.us like bookmark service) or maybe you are
trying to facet on an analyzed field. You will have to examine your queries
a bit (and you should start monitoring your system in order to know when it
is short of memory). You could also customize your facet queries a bit
using regexes or frequency in order to not need so much memory. See
Elasticsearch Platform — Find real-time answers at scale | Elastic

Also, if your first node goes down with OOM and you fire the same query to
the second node, it is no wonder, it is going down as well.
Anlyzing the GC logs (check the GC logging section in the config file) and
the slow query and slow index logs might help as well, along with
monitoring the nodes stats.
Elasticsearch Platform — Find real-time answers at scale | Elastic
Elasticsearch Platform — Find real-time answers at scale | Elastic

Hope this helps a bit. You should invest a bit more time in finding out,
where the OOMs come from. Maybe they can be prevented without more hardware
pretty easily.

--Alex

On Thu, Jun 6, 2013 at 7:35 PM, <mojori...@gmail.com <javascript:>> wrote:

Hello,

We have a problem where one of our Elastic search nodes runs out of
memory and crashes. In the process, the whole (two-node) cluster stops
functioning. What can we do to make it so if one node fails, it does not
affect the other? All shards are replicated on both nodes.

I realize we are probably cutting it too close with regard to memory,
which is most likely why the first node goes down -- the index is 8.1 GB
and each node has 8 GB RAM. Are there any specific memory requirements for
Elastic Search? I've been searching quite a bit and have not been able to
find anything on system requirements for the application. I presume having
at least as much memory as the index is large would be a good place to
start but it would probably be best to know what is proper practise.

Thanks for any help you can provide!

Mike

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

spinscale · June 8, 2013, 11:44am

Hey,

if you apply the same load (queries and indexing) to the other node and it
does not crash, I highly suspect a different configuration (might be a
different JVM version, different configuration for memory settings of the
user elasticsearch runs as, maybe even different hardware (might be easily
happen due to virtualization)?). Tough to find out remotely...

Maybe you can configure the slow query/index log as well as monitor your
garbage collections in order to get more information before the problem
occurs...

--Alex

On Fri, Jun 7, 2013 at 9:29 AM, Alexander Reelsen alr@spinscale.de wrote:

Hey,

the memory usage of elasticsearch highly depends on the usage pattern of
it (hence it is hard putting up a sheet telling that it works that fast
with x CPUs and Y GB of RAM) ranging from indexing (document size,
documents per second) to different queries (like simple filters vs. geo
queries). With 8 GB of RAM you should give elasticsearch 4GB of heap and
make sure no other services (like a database for example) run on it. The
other half of the memory will be used by filesystem caches, so you have
less disk I/O.

A reason why your cluster goes out of memory might be (wildly speculating
here), that you are facetting and sorting a lot on high cardinality fields
(fields that have lots and lots of values). This might be user defined tags
(for example in a del.icio.us like bookmark service) or maybe you are
trying to facet on an analyzed field. You will have to examine your queries
a bit (and you should start monitoring your system in order to know when it
is short of memory). You could also customize your facet queries a bit
using regexes or frequency in order to not need so much memory. See
Elasticsearch Platform — Find real-time answers at scale | Elastic

Also, if your first node goes down with OOM and you fire the same query to
the second node, it is no wonder, it is going down as well.
Anlyzing the GC logs (check the GC logging section in the config file) and
the slow query and slow index logs might help as well, along with
monitoring the nodes stats.
Elasticsearch Platform — Find real-time answers at scale | Elastic
Elasticsearch Platform — Find real-time answers at scale | Elastic

Hope this helps a bit. You should invest a bit more time in finding out,
where the OOMs come from. Maybe they can be prevented without more hardware
pretty easily.

--Alex

On Thu, Jun 6, 2013 at 7:35 PM, mojorising00@gmail.com wrote:

Hello,

We have a problem where one of our Elastic search nodes runs out of
memory and crashes. In the process, the whole (two-node) cluster stops
functioning. What can we do to make it so if one node fails, it does not
affect the other? All shards are replicated on both nodes.

I realize we are probably cutting it too close with regard to memory,
which is most likely why the first node goes down -- the index is 8.1 GB
and each node has 8 GB RAM. Are there any specific memory requirements for
Elastic Search? I've been searching quite a bit and have not been able to
find anything on system requirements for the application. I presume having
at least as much memory as the index is large would be a good place to
start but it would probably be best to know what is proper practise.

Thanks for any help you can provide!

Mike

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

On Fri, Jun 7, 2013 at 9:08 PM, mojorising00@gmail.com wrote:

Thanks for your reply, Alex!

The information you provided on heap size is crucial. I'll adjust that in
staging today and see what happens. I know it's hard to provide precision
with regard to system requirements / recommendations but that's true of
pretty much every software application in existence. I really wish the
Elasticsearch site gave a lot more information there on where to start and
general sane reccommendations for a couple basic given scenarios (an index
size of X, Y number of queries, etc). It's understandable there would be
some tuning and refinement for each individual use case but we need a
better guide on how to approach the matter. When we get our set-up dialled
in, I'll be happy to post some details for others' reference.

We'll definitely examine our queries to see if we can make some
optimizations there.

On the cluster crashing, I wasn't very clear: The other node doesn't
appear to be crashing due to memory errors (oddly). We haven't found many
clues (in log files, Bigdesk, etc) at all into why the second node goes
down -- there are no memory errors thrown by the second node. Can you think
of any other reasons this might happen.. or maybe some good places to
check.

Thanks again,
Mike

On Friday, 7 June 2013 00:29:52 UTC-7, Alexander Reelsen wrote:

Hey,

the memory usage of elasticsearch highly depends on the usage pattern of
it (hence it is hard putting up a sheet telling that it works that fast
with x CPUs and Y GB of RAM) ranging from indexing (document size,
documents per second) to different queries (like simple filters vs. geo
queries). With 8 GB of RAM you should give elasticsearch 4GB of heap and
make sure no other services (like a database for example) run on it. The
other half of the memory will be used by filesystem caches, so you have
less disk I/O.

A reason why your cluster goes out of memory might be (wildly speculating
here), that you are facetting and sorting a lot on high cardinality fields
(fields that have lots and lots of values). This might be user defined tags
(for example in a del.icio.us like bookmark service) or maybe you are
trying to facet on an analyzed field. You will have to examine your queries
a bit (and you should start monitoring your system in order to know when it
is short of memory). You could also customize your facet queries a bit
using regexes or frequency in order to not need so much memory. See
http://www.elasticsearch.**org/guide/reference/index-**modules/fielddata/http://www.elasticsearch.org/guide/reference/index-modules/fielddata/

Also, if your first node goes down with OOM and you fire the same query
to the second node, it is no wonder, it is going down as well.
Anlyzing the GC logs (check the GC logging section in the config file)
and the slow query and slow index logs might help as well, along with
monitoring the nodes stats.
Elasticsearch Platform — Find real-time answers at scale | Elastic http://www.elasticsearch.org/guide/reference/index-modules/slowlog/
Elasticsearch Platform — Find real-time answers at scale | Elastic**
cluster-nodes-stats/http://www.elasticsearch.org/guide/reference/api/admin-cluster-nodes-stats/

Hope this helps a bit. You should invest a bit more time in finding out,
where the OOMs come from. Maybe they can be prevented without more hardware
pretty easily.

--Alex

On Thu, Jun 6, 2013 at 7:35 PM, mojori...@gmail.com wrote:

Hello,

We have a problem where one of our Elastic search nodes runs out of
memory and crashes. In the process, the whole (two-node) cluster stops
functioning. What can we do to make it so if one node fails, it does not
affect the other? All shards are replicated on both nodes.

I realize we are probably cutting it too close with regard to memory,
which is most likely why the first node goes down -- the index is 8.1 GB
and each node has 8 GB RAM. Are there any specific memory requirements for
Elastic Search? I've been searching quite a bit and have not been able to
find anything on system requirements for the application. I presume having
at least as much memory as the index is large would be a good place to
start but it would probably be best to know what is proper practise.

Thanks for any help you can provide!

Mike

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_out https://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

moj0rising · June 11, 2013, 8:34pm

Thanks again, Alex. We'll make those logging adjustments and keep digging.

Mike

On Saturday, 8 June 2013 04:44:57 UTC-7, Alexander Reelsen wrote:

Hey,

if you apply the same load (queries and indexing) to the other node and it
does not crash, I highly suspect a different configuration (might be a
different JVM version, different configuration for memory settings of the
user elasticsearch runs as, maybe even different hardware (might be easily
happen due to virtualization)?). Tough to find out remotely...

Maybe you can configure the slow query/index log as well as monitor your
garbage collections in order to get more information before the problem
occurs...

--Alex

On Fri, Jun 7, 2013 at 9:29 AM, Alexander Reelsen <a...@spinscale.de<javascript:>

wrote:

Hey,

the memory usage of elasticsearch highly depends on the usage pattern of
it (hence it is hard putting up a sheet telling that it works that fast
with x CPUs and Y GB of RAM) ranging from indexing (document size,
documents per second) to different queries (like simple filters vs. geo
queries). With 8 GB of RAM you should give elasticsearch 4GB of heap and
make sure no other services (like a database for example) run on it. The
other half of the memory will be used by filesystem caches, so you have
less disk I/O.

A reason why your cluster goes out of memory might be (wildly speculating
here), that you are facetting and sorting a lot on high cardinality fields
(fields that have lots and lots of values). This might be user defined tags
(for example in a del.icio.us like bookmark service) or maybe you are
trying to facet on an analyzed field. You will have to examine your queries
a bit (and you should start monitoring your system in order to know when it
is short of memory). You could also customize your facet queries a bit
using regexes or frequency in order to not need so much memory. See
Elasticsearch Platform — Find real-time answers at scale | Elastic

Also, if your first node goes down with OOM and you fire the same query
to the second node, it is no wonder, it is going down as well.
Anlyzing the GC logs (check the GC logging section in the config file)
and the slow query and slow index logs might help as well, along with
monitoring the nodes stats.
Elasticsearch Platform — Find real-time answers at scale | Elastic

Elasticsearch Platform — Find real-time answers at scale | Elastic

Hope this helps a bit. You should invest a bit more time in finding out,
where the OOMs come from. Maybe they can be prevented without more hardware
pretty easily.

--Alex

On Thu, Jun 6, 2013 at 7:35 PM, <mojori...@gmail.com <javascript:>>
wrote:

Hello,

We have a problem where one of our Elastic search nodes runs out of
memory and crashes. In the process, the whole (two-node) cluster stops
functioning. What can we do to make it so if one node fails, it does not
affect the other? All shards are replicated on both nodes.

I realize we are probably cutting it too close with regard to memory,
which is most likely why the first node goes down -- the index is 8.1 GB
and each node has 8 GB RAM. Are there any specific memory requirements for
Elastic Search? I've been searching quite a bit and have not been able to
find anything on system requirements for the application. I presume having
at least as much memory as the index is large would be a good place to
start but it would probably be best to know what is proper practise.

Thanks for any help you can provide!

Mike

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

On Fri, Jun 7, 2013 at 9:08 PM, <mojori...@gmail.com <javascript:>> wrote:

Thanks for your reply, Alex!

The information you provided on heap size is crucial. I'll adjust that in
staging today and see what happens. I know it's hard to provide precision
with regard to system requirements / recommendations but that's true of
pretty much every software application in existence. I really wish the
Elasticsearch site gave a lot more information there on where to start and
general sane reccommendations for a couple basic given scenarios (an index
size of X, Y number of queries, etc). It's understandable there would be
some tuning and refinement for each individual use case but we need a
better guide on how to approach the matter. When we get our set-up dialled
in, I'll be happy to post some details for others' reference.

We'll definitely examine our queries to see if we can make some
optimizations there.

On the cluster crashing, I wasn't very clear: The other node doesn't
appear to be crashing due to memory errors (oddly). We haven't found many
clues (in log files, Bigdesk, etc) at all into why the second node goes
down -- there are no memory errors thrown by the second node. Can you think
of any other reasons this might happen.. or maybe some good places to
check.

Thanks again,
Mike

On Friday, 7 June 2013 00:29:52 UTC-7, Alexander Reelsen wrote:

Hey,

the memory usage of elasticsearch highly depends on the usage pattern of
it (hence it is hard putting up a sheet telling that it works that fast
with x CPUs and Y GB of RAM) ranging from indexing (document size,
documents per second) to different queries (like simple filters vs. geo
queries). With 8 GB of RAM you should give elasticsearch 4GB of heap and
make sure no other services (like a database for example) run on it. The
other half of the memory will be used by filesystem caches, so you have
less disk I/O.

A reason why your cluster goes out of memory might be (wildly
speculating here), that you are facetting and sorting a lot on high
cardinality fields (fields that have lots and lots of values). This might
be user defined tags (for example in a del.icio.us like bookmark
service) or maybe you are trying to facet on an analyzed field. You will
have to examine your queries a bit (and you should start monitoring your
system in order to know when it is short of memory). You could also
customize your facet queries a bit using regexes or frequency in order to
not need so much memory. See http://www.elasticsearch.**
org/guide/reference/index-**modules/fielddata/http://www.elasticsearch.org/guide/reference/index-modules/fielddata/

Also, if your first node goes down with OOM and you fire the same query
to the second node, it is no wonder, it is going down as well.
Anlyzing the GC logs (check the GC logging section in the config file)
and the slow query and slow index logs might help as well, along with
monitoring the nodes stats.
Elasticsearch Platform — Find real-time answers at scale | Elastic http://www.elasticsearch.org/guide/reference/index-modules/slowlog/
Elasticsearch Platform — Find real-time answers at scale | Elastic**
cluster-nodes-stats/http://www.elasticsearch.org/guide/reference/api/admin-cluster-nodes-stats/

Hope this helps a bit. You should invest a bit more time in finding out,
where the OOMs come from. Maybe they can be prevented without more hardware
pretty easily.

--Alex

On Thu, Jun 6, 2013 at 7:35 PM, mojori...@gmail.com wrote:

Hello,

We have a problem where one of our Elastic search nodes runs out of
memory and crashes. In the process, the whole (two-node) cluster stops
functioning. What can we do to make it so if one node fails, it does not
affect the other? All shards are replicated on both nodes.

I realize we are probably cutting it too close with regard to memory,
which is most likely why the first node goes down -- the index is 8.1 GB
and each node has 8 GB RAM. Are there any specific memory requirements for
Elastic Search? I've been searching quite a bit and have not been able to
find anything on system requirements for the application. I presume having
at least as much memory as the index is large would be a good place to
start but it would probably be best to know what is proper practise.

Thanks for any help you can provide!

Mike

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_out https://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Single node takes down entire cluster Elasticsearch	1	309	July 6, 2017
Crashed with out of memory Elasticsearch	7	842	July 5, 2017
ElasticSearch crashes in single node cluster- Issue #1 Elasticsearch	20	2879	June 12, 2019
A few general questions about Elasticsearch Elasticsearch	14	866	April 6, 2018
Node tasks and their memory requirements Elasticsearch	3	305	July 6, 2017

Single node takes down entire cluster

Related topics