Our es suddenly got blocked for 15 minutes. That means: it suddenly
stopped handling search requests and also status requests:curl -XGET http://localhost:9200/_status.
ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors.
OS: centos 5.
Here is the gist: https://gist.github.com/1467971
This info was obtained by jstack. (jstack -F 27029 (pid of es)). It looks
like all the threads got blocked (all 132 )?!
The size of the jstack output is much more bigger I've copy pasted only
some parts of it.
Garbage Collection? How much memory are you giving each JVM? If it's a
large amount and you haven't tuned your GC options on the JVM, this is a
likely cause.
I don't suppose you had something monitoring JMX over that time period. If
you did you'd be able to see if this was the issue if you notice the Heap
Space Used dropping off.
Our es suddenly got blocked for 15 minutes. That means: it suddenly
stopped handling search requests and also status requests:curl -XGET http://localhost:9200/_status.
ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors.
OS: centos 5.
Here is the gist: jstack elastic_search · GitHub
This info was obtained by jstack. (jstack -F 27029 (pid of es)). It looks
like all the threads got blocked (all 132 )?!
The size of the jstack output is much more bigger I've copy pasted only
some parts of it.
Garbage Collection? How much memory are you giving each JVM? If it's a
large amount and you haven't tuned your GC options on the JVM, this is a
likely cause.
I don't suppose you had something monitoring JMX over that time period. If
you did you'd be able to see if this was the issue if you notice the Heap
Space Used dropping off.
Our es suddenly got blocked for 15 minutes. That means: it suddenly
stopped handling search requests and also status requests:curl -XGET http://localhost:9200/_status.
ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors.
OS: centos 5.
Here is the gist: jstack elastic_search · GitHub
This info was obtained by jstack. (jstack -F 27029 (pid of es)). It looks
like all the threads got blocked (all 132 )?!
The size of the jstack output is much more bigger I've copy pasted only
some parts of it.
You configured ES with 100g? Do you have a machine with a 100gb of memory?
How much memory does your machine has? I am surprise that it even
started... (swap might be really big).
Garbage Collection? How much memory are you giving each JVM? If it's a
large amount and you haven't tuned your GC options on the JVM, this is a
likely cause.
I don't suppose you had something monitoring JMX over that time period.
If you did you'd be able to see if this was the issue if you notice the
Heap Space Used dropping off.
Our es suddenly got blocked for 15 minutes. That means: it suddenly
stopped handling search requests and also status requests:curl -XGET http://localhost:9200/_status.
ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors.
OS: centos 5.
Here is the gist: jstack elastic_search · GitHub
This info was obtained by jstack. (jstack -F 27029 (pid of es)). It
looks like all the threads got blocked (all 132 )?!
The size of the jstack output is much more bigger I've copy pasted only
some parts of it.
Yes, the machine has 128 GB , 48 cores. And till now we didn't have no
problems.
Anyhow, yesterday I restarted the machine, and everything got back to
normal.
Still I'm interesting in how to configure the jmx for es. Any tips?
On Mon, Dec 12, 2011 at 11:43 PM, Shay Banon kimchy@gmail.com wrote:
You configured ES with 100g? Do you have a machine with a 100gb of memory?
How much memory does your machine has? I am surprise that it even
started... (swap might be really big).
Garbage Collection? How much memory are you giving each JVM? If it's a
large amount and you haven't tuned your GC options on the JVM, this is a
likely cause.
I don't suppose you had something monitoring JMX over that time period.
If you did you'd be able to see if this was the issue if you notice the
Heap Space Used dropping off.
Our es suddenly got blocked for 15 minutes. That means: it suddenly
stopped handling search requests and also status requests:curl -XGET http://localhost:9200/_status.
ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors.
OS: centos 5.
Here is the gist: jstack elastic_search · GitHub
This info was obtained by jstack. (jstack -F 27029 (pid of es)). It
looks like all the threads got blocked (all 132 )?!
The size of the jstack output is much more bigger I've copy pasted
only some parts of it.
Yeah, with stutters like this it's very useful to know what's going on with
your heap (and other resources). You can watch the heap via JConsole or use
some monitoring tool like traverse to fire emails when the heap gets too
big or simply be able to view historical graphs of all your JMX exposed
variables.
To enable JMX it looks like you need jmx.create_connector: true in your
elasticsearch.yml.
VisualVM also has an awesome Visual GC plugin that lets you see which of
the various sections of your Heap are filling up.
Yes, the machine has 128 GB , 48 cores. And till now we didn't have no
problems.
Anyhow, yesterday I restarted the machine, and everything got back to
normal.
Still I'm interesting in how to configure the jmx for es. Any tips?
On Mon, Dec 12, 2011 at 11:43 PM, Shay Banon kimchy@gmail.com wrote:
You configured ES with 100g? Do you have a machine with a 100gb of
memory? How much memory does your machine has? I am surprise that it even
started... (swap might be really big).
Garbage Collection? How much memory are you giving each JVM? If it's a
large amount and you haven't tuned your GC options on the JVM, this is a
likely cause.
I don't suppose you had something monitoring JMX over that time period.
If you did you'd be able to see if this was the issue if you notice the
Heap Space Used dropping off.
Our es suddenly got blocked for 15 minutes. That means: it suddenly
stopped handling search requests and also status requests:curl -XGET http://localhost:9200/_status.
ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors.
OS: centos 5.
Here is the gist: jstack elastic_search · GitHub
This info was obtained by jstack. (jstack -F 27029 (pid of es)). It
looks like all the threads got blocked (all 132 )?!
The size of the jstack output is much more bigger I've copy pasted
only some parts of it.
Also, I would add that you make sure to enable mlockall in the
configuration to make sure the OS will not swap the elasticsearch process.
I never ran ES with 100gb of memory, whats your typical memory usage? (node
stats can give you a lot of information, also on the jvm level).
Yeah, with stutters like this it's very useful to know what's going on
with your heap (and other resources). You can watch the heap via JConsole
or use some monitoring tool like traverse to fire emails when the heap gets
too big or simply be able to view historical graphs of all your JMX exposed
variables.
To enable JMX it looks like you need jmx.create_connector: true in your
elasticsearch.yml.
VisualVM also has an awesome Visual GC plugin that lets you see which of
the various sections of your Heap are filling up.
Yes, the machine has 128 GB , 48 cores. And till now we didn't have no
problems.
Anyhow, yesterday I restarted the machine, and everything got back to
normal.
Still I'm interesting in how to configure the jmx for es. Any tips?
On Mon, Dec 12, 2011 at 11:43 PM, Shay Banon kimchy@gmail.com wrote:
You configured ES with 100g? Do you have a machine with a 100gb of
memory? How much memory does your machine has? I am surprise that it even
started... (swap might be really big).
Garbage Collection? How much memory are you giving each JVM? If it's a
large amount and you haven't tuned your GC options on the JVM, this is a
likely cause.
I don't suppose you had something monitoring JMX over that time
period. If you did you'd be able to see if this was the issue if you notice
the Heap Space Used dropping off.
Our es suddenly got blocked for 15 minutes. That means: it suddenly
stopped handling search requests and also status requests:curl -XGET http://localhost:9200/_status.
ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors.
OS: centos 5.
Here is the gist: jstack elastic_search · GitHub
This info was obtained by jstack. (jstack -F 27029 (pid of es)). It
looks like all the threads got blocked (all 132 )?!
The size of the jstack output is much more bigger I've copy pasted
only some parts of it.
The es continues to hangs. I can reproduce the problem again and again, by
executing the following query:
{ "facets": { "term_count": { "global": true, "terms": {
"field": "body", "size": 100 } } }, "size": 0}
(I'm trying to retrieve the most common 100 terms for this field).
To my surprise, I've discovered in the es folder, a set of huge files:*
java_pid_xxx.hprof*. Generated each time my es got* 'blocked'.*
Now, this files are generated when the process runs out of memory.
I'm running with 100G of heap memory allocated, and I expect that for large
memory consuming operations the memory to be swapped.
Anyhow, it seems that on my machine, each time I'm running the above query,
I'm killing it.
through JConsole, each time I'm executing this query, I can see how the
heap memory increases !very fast! it seems that 100G are consumed in a few
seconds.
regarding the node stats and cluster stats es provides: I cannot access
them after es dies, but here is the info of es, right after restart when
everything is okey:
os: {
refresh_interval: 1000
cpu: {
vendor: AMD
model: Opteron
mhz: 1900
total_cores: 48
total_sockets: 4
cores_per_socket: 12
cache_size: 512b
cache_size_in_bytes: 512
}
mem: {
total: 126gb
total_in_bytes: 135321870336
}
swap: {
total: 64gb
total_in_bytes: 68803354624
}
}
process: {
refresh_interval: 1000
id: 15926
max_file_descriptors: 128000
}
jvm: {
pid: 15926
version: 1.6.0_27
vm_name: Java HotSpot(TM) 64-Bit Server VM
vm_version: 20.2-b06
vm_vendor: Sun Microsystems Inc.
start_time: 1324275713418
mem: {
heap_init: 100gb
heap_init_in_bytes: 107374182400
heap_max: 99.9gb
heap_max_in_bytes: 107302223872
non_heap_init: 23.1mb
non_heap_init_in_bytes: 24313856
non_heap_max: 130mb
non_heap_max_in_bytes: 136314880
}
}
Any suggestions?
Tnx,
Alex
On Tue, Dec 13, 2011 at 11:23 PM, Shay Banon kimchy@gmail.com wrote:
Also, I would add that you make sure to enable mlockall in the
configuration to make sure the OS will not swap the elasticsearch process.
I never ran ES with 100gb of memory, whats your typical memory usage? (node
stats can give you a lot of information, also on the jvm level).
Yeah, with stutters like this it's very useful to know what's going on
with your heap (and other resources). You can watch the heap via JConsole
or use some monitoring tool like traverse to fire emails when the heap gets
too big or simply be able to view historical graphs of all your JMX exposed
variables.
To enable JMX it looks like you need jmx.create_connector: true in your
elasticsearch.yml.
VisualVM also has an awesome Visual GC plugin that lets you see which of
the various sections of your Heap are filling up.
Yes, the machine has 128 GB , 48 cores. And till now we didn't have no
problems.
Anyhow, yesterday I restarted the machine, and everything got back to
normal.
Still I'm interesting in how to configure the jmx for es. Any tips?
On Mon, Dec 12, 2011 at 11:43 PM, Shay Banon kimchy@gmail.com wrote:
You configured ES with 100g? Do you have a machine with a 100gb of
memory? How much memory does your machine has? I am surprise that it even
started... (swap might be really big).
Garbage Collection? How much memory are you giving each JVM? If it's
a large amount and you haven't tuned your GC options on the JVM, this is a
likely cause.
I don't suppose you had something monitoring JMX over that time
period. If you did you'd be able to see if this was the issue if you notice
the Heap Space Used dropping off.
Our es suddenly got blocked for 15 minutes. That means: it
suddenly stopped handling search requests and also status requests:curl
-XGET http://localhost:9200/_status.
ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors.
OS: centos 5.
Here is the gist: jstack elastic_search · GitHub
This info was obtained by jstack. (jstack -F 27029 (pid of es)). It
looks like all the threads got blocked (all 132 )?!
The size of the jstack output is much more bigger I've copy pasted
only some parts of it.
hprof files are generated by - XX:+HeapDumpOnOutOfMemoryError parameter. The files should be roughly same size as heap size when OOM occurs, but could you give us the size of those files?
jhat profiles does not show a 100GB used heap, but do you have any OOM error in log files? I don't know if you modified launch scripts, but theses errors should be redirected via stderr. Do you have a huge CPU consommation on 1 or more CPU during the "es blocked" situation?
It should be interesting to:
redirect stderr and stdout to a file if it's not done already (with &> /path/to/file.log at the end of java command file)
activate verbose gc to a specific file
launch visualvm with visualgc plugin
launch recurrent thread dumps (with kill -3 ) within short time period (one thread dump every 10 or 15 seconds)
and then reproduce the problem asap to avoid generating to much logs.
If you have an OOM, you will wich generation is full with the verbosegc and visualgc, if you have a problem with threads you will see them in thread dumps (I use samurai http://yusuke.homeip.net/samurai/ to analyze thread dumps), MAT to analyze heap dump/hprof files (but huge hprof should be difficult to read, maybe with jhat). Samurai can read verbosegc files too.
In any case, try to use a more recent JVM, please send the complete command line of your ES, and maybe try another JDK (like jrockit with either generation/non generational GC).
The es continues to hangs. I can reproduce the problem again and
again, by executing the following query:
{ "facets": { "term_count": { "global": true, "terms": { "field":
"body", "size": 100 } } }, "size": 0}
(I'm trying to retrieve the most common 100 terms for this field).
To my surprise, I've discovered in the es folder, a set of huge
files: java_pid_ xxx .hprof .. Generated each time my es got
'blocked'.
Now, this files are generated when the process runs out of memory.
I'm running with 100G of heap memory allocated, and I expect that for
large memory consuming operations the memory to be swapped.
Anyhow, it seems that on my machine, each time I'm running the above
query, I'm killing it.
through JConsole , each time I'm executing this query, I can see
how the heap memory increases !very fast! it seems that 100G are
consumed in a few seconds.
regarding the node stats and cluster stats es provides: I cannot
access them after es dies, but here is the info of es, right after
restart when everything is okey:
On Tue, Dec 13, 2011 at 11:23 PM, Shay Banon < kimchy@gmail.com >
wrote:
Also, I would add that you make sure to enable mlockall in the
configuration to make sure the OS will not swap the elasticsearch
process. I never ran ES with 100gb of memory, whats your typical
memory usage? (node stats can give you a lot of information, also
on
the jvm level).
Yeah, with stutters like this it's very useful to know what's
going
on with your heap (and other resources). You can watch the heap
via
JConsole or use some monitoring tool like traverse to fire emails
when the heap gets too big or simply be able to view historical
graphs of all your JMX exposed variables.
To enable JMX it looks like you need jmx.create_connector: true
in
your elasticsearch.yml .
VisualVM also has an awesome Visual GC plugin that lets you see
which
of the various sections of your Heap are filling up.
Cheers,
Paul.
On Tue, Dec 13, 2011 at 12:54 AM, Sisu Alexandru < sisu.eugen@gmail.com > wrote:
Yes, the machine has 128 GB , 48 cores. And till now we didn't
have
no problems.
Anyhow, yesterday I restarted the machine, and everything got
back
to
normal.
Still I'm interesting in how to configure the jmx for es. Any
tips?
You configured ES with 100g? Do you have a machine with a
100gb
of
memory? How much memory does your machine has? I am surprise
that
it
even started.... (swap might be really big).
On Mon, Dec 12, 2011 at 11:12 PM, Sisu Alexandru < sisu.eugen@gmail.com > wrote:
Well I started up Elasticsearch with Xms and Xmx set to
100G.
That should've been enough.
Monitoring though JMX sounds a good ideea. but how can you
configure
jmx options in es?
Garbage Collection? How much memory are you giving each
JVM?
If
it's
a large amount and you haven't tuned your GC options on
the
JVM,
this is a likely cause.
I don't suppose you had something monitoring JMX over
that
time
period. If you did you'd be able to see if this was the
issue
if
you
notice the Heap Space Used dropping off.
Our es suddenly got blocked for 15 minutes.. That
means:
it
suddenly
stopped handling search requests and also status
requests:curl
-XGET http://localhost:9200/_status .
hprof files are generated by -XX:+HeapDumpOnOutOfMemoryError
parameter. The files should be roughly same size as heap size when OOM
occurs, but could you give us the size of those files?
jhat profiles does not show a 100GB used heap, but do you have any OOM
error in log files? I don't know if you modified launch scripts, but
theses errors should be redirected via stderr. Do you have a huge CPU
consommation on 1 or more CPU during the "es blocked" situation?
It should be interesting to:
redirect stderr and stdout to a file if it's not done already (with
&> /path/to/file.log at the end of java command file)
activate verbose gc to a specific file
launch visualvm with visualgc plugin
launch recurrent thread dumps (with kill -3 ) within short time
period (one thread dump every 10 or 15 seconds)
and then reproduce the problem asap to avoid generating to much logs.
If you have an OOM, you will wich generation is full with the
verbosegc and visualgc, if you have a problem with threads you will
see them in thread dumps (I use samurai http://yusuke.homeip.net/samurai/
to analyze thread dumps), MAT to analyze heap dump/hprof files (but
huge hprof should be difficult to read, maybe with jhat). Samurai can
read verbosegc files too.
In any case, try to use a more recent JVM, please send the complete
command line of your ES, and maybe try another JDK (like jrockit with
either generation/non generational GC).
100GB is a quite huge memory for actual GC, see
Rgds.
De: "Sisu Alexandru" <sisu.eugen@gmail.com>
À: elasticsearch@googlegroups.com
Envoyé: Lundi 19 Décembre 2011 15:11:30
Objet: Re: es got blocked
Hello all,
Following your suggestion I've tried:
- running with bootstrap.mlockall set to True.
- i've enabled the JMX monitoring.
The es continues to hangs. I can reproduce the problem again and
again, by executing the following query:
{ "facets": { "term_count": { "global": true,
"terms": { "field": "body", "size": 100 } } },
"size": 0}
(I'm trying to retrieve the most common 100 terms for this field).
To my surprise, I've discovered in the es folder, a set of huge
files: java_pid_xxx.hprof.. Generated each time my es got 'blocked'.
Now, this files are generated when the process runs out of
memory.
I'm running with 100G of heap memory allocated, and I expect that
for large memory consuming operations the memory to be swapped.
Anyhow, it seems that on my machine, each time I'm running the
above query, I'm killing it.
Other informations:
- the jhat heap histogram can be found here: https://gist.github.com/1497379
- through JConsole, each time I'm executing this query, I can see
how the heap memory increases !very fast! it seems that 100G are
consumed in a few seconds.
- regarding the node stats and cluster stats es provides: I cannot
access them after es dies, but here is the info of es, right after
restart when everything is okey:
os: {
refresh_interval: 1000
cpu: {
vendor: AMD
model: Opteron
mhz: 1900
total_cores: 48
total_sockets: 4
cores_per_socket: 12
cache_size: 512b
cache_size_in_bytes: 512
}
mem: {
total: 126gb
total_in_bytes: 135321870336
}
swap: {
total: 64gb
total_in_bytes: 68803354624
}
}
process: {
refresh_interval: 1000
id: 15926
max_file_descriptors: 128000
}
jvm: {
pid: 15926
version: 1.6..0_27
vm_name: Java HotSpot(TM) 64-Bit Server VM
vm_version: 20.2-b06
vm_vendor: Sun Microsystems Inc.
start_time: 1324275713418
mem: {
heap_init: 100gb
heap_init_in_bytes: 107374182400
heap_max: 99.9gb
heap_max_in_bytes: 107302223872
non_heap_init: 23.1mb
non_heap_init_in_bytes: 24313856
non_heap_max: 130mb
non_heap_max_in_bytes: 136314880
}
}
Any suggestions?
Tnx,
Alex
On Tue, Dec 13, 2011 at 11:23 PM, Shay Banon kimchy@gmail.com
wrote:
Also, I would add that you make sure to enable mlockall in the
configuration to make sure the OS will not swap the elasticsearch
process. I never ran ES with 100gb of memory, whats your typical
memory usage? (node stats can give you a lot of information, also on
the jvm level).
That's some machine!
Yeah, with stutters like this it's very useful to know
what's going on with your heap (and other resources). You can watch
the heap via JConsole or use some monitoring tool like traverse to
fire emails when the heap gets too big or simply be able to view
historical graphs of all your JMX exposed variables.
To enable JMX it looks like you need jmx.create_connector:
true in your elasticsearch.yml.
VisualVM also has an awesome Visual GC plugin that lets
you see which of the various sections of your Heap are filling up.
Cheers,
Paul.
On Tue, Dec 13, 2011 at 12:54 AM, Sisu Alexandru
Yes, the machine has 128 GB , 48 cores. And till now
we didn't have no problems.
Anyhow, yesterday I restarted the machine, and
everything got back to normal.
Still I'm interesting in how to configure the jmx for
es. Any tips?
On Mon, Dec 12, 2011 at 11:43 PM, Shay Banon kimchy@gmail.com wrote:
You configured ES with 100g? Do you have a machine
with a 100gb of memory? How much memory does your machine has? I am
surprise that it even started.... (swap might be really big).
Xmx set to 100G.
That should've been enough.
Monitoring though JMX sounds a good ideea. but
how can you configure jmx options in es?
The only documentation that I found was here http://www.elasticsearch.org/guide/reference/modules/jmx.html but its
not very clear to me what should I do.
Tnx,
Alex
minutes.. That means: it suddenly stopped handling search requests
and also status requests:curl -XGET http://localhost:9200/_status.
ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors.
OS: centos 5.
Here is the gist: https://gist.github.com/1467971
This info was obtained by jstack.
(jstack -F 27029 (pid of es)). It looks like all the threads got
blocked (all 132 )?!
The size of the jstack output is much
more bigger I've copy pasted only some parts of it.
Any ideas?
Tnx in advance,
Alex
--
Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy
--
---------------------------------------------
Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy
Tnx for the prompt answer.
I run es with in foreground so that I can see the eventual OOM. And indeed
it turned out to be an OOM:
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid29451.hprof ...
Exception in thread "elasticsearch[search]-pool-3-thread-18"
java.lang.OutOfMemoryError: Java heap space
at
org.elasticsearch.index.field.data.support.FieldDataLoader.load(FieldDataLoader.java:61)
at
org.elasticsearch.index.field.data.strings.StringFieldData.load(StringFieldData.java:84)
at
org.elasticsearch.index.field.data.strings.StringFieldDataType.load(StringFieldDataType.java:52)
at
org.elasticsearch.index.field.data.strings.StringFieldDataType.load(StringFieldDataType.java:34)
at org.elasticsearch.index.field.data.FieldData.load(FieldData.java:110)
at
org.elasticsearch.index.cache.field.data.support.AbstractConcurrentMapFieldDataCache.cache(AbstractConcurrentMapFieldDataCache.java:119)
at
org.elasticsearch.search.facet.terms.strings.TermsStringOrdinalsFacetCollector.doSetNextReader(TermsStringOrdinalsFacetCollector.java:127)
at
org.elasticsearch.search.facet.AbstractFacetCollector.setNextReader(AbstractFacetCollector.java:71)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:576)
at
org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:199)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:383)
currently is 70GB and is increasing (slower). Probably it will reach 100
gb.
What I did right after this was to run on my development machine the same
query on a smaller dataset.
It seems that ResidentFieldDataCache that extends the
(AbstractConcurrentMapFieldDataCache) doesnt get cleared? (I tried to put a
breakpoint on clear method , and it never gets called).
On the other hand, I also didn't configured my index with no caching
options.
I'm working now on getting the thread dumps and the verbose output of gc.
hprof files are generated by -XX:+HeapDumpOnOutOfMemoryError
parameter. The files should be roughly same size as heap size when OOM
occurs, but could you give us the size of those files?
jhat profiles does not show a 100GB used heap, but do you have any OOM
error in log files? I don't know if you modified launch scripts, but
theses errors should be redirected via stderr. Do you have a huge CPU
consommation on 1 or more CPU during the "es blocked" situation?
It should be interesting to:
redirect stderr and stdout to a file if it's not done already (with
&> /path/to/file.log at the end of java command file)
activate verbose gc to a specific file
launch visualvm with visualgc plugin
launch recurrent thread dumps (with kill -3 ) within short time
period (one thread dump every 10 or 15 seconds)
and then reproduce the problem asap to avoid generating to much logs.
If you have an OOM, you will wich generation is full with the
verbosegc and visualgc, if you have a problem with threads you will
see them in thread dumps (I use samurai http://yusuke.homeip.net/samurai/
to analyze thread dumps), MAT to analyze heap dump/hprof files (but
huge hprof should be difficult to read, maybe with jhat). Samurai can
read verbosegc files too.
In any case, try to use a more recent JVM, please send the complete
command line of your ES, and maybe try another JDK (like jrockit with
either generation/non generational GC).
i've enabled the JMX monitoring.
The es continues to hangs. I can reproduce the problem again and
again, by executing the following query:
{ "facets": { "term_count": { "global": true,
"terms": { "field": "body", "size": 100 } } },
"size": 0}
(I'm trying to retrieve the most common 100 terms for this field).
To my surprise, I've discovered in the es folder, a set of huge
files: java_pid_xxx.hprof.. Generated each time my es got 'blocked'.
Now, this files are generated when the process runs out of
memory.
I'm running with 100G of heap memory allocated, and I expect that
for large memory consuming operations the memory to be swapped.
Anyhow, it seems that on my machine, each time I'm running the
above query, I'm killing it.
Other informations:
through JConsole, each time I'm executing this query, I can see
how the heap memory increases !very fast! it seems that 100G are
consumed in a few seconds.
regarding the node stats and cluster stats es provides: I cannot
access them after es dies, but here is the info of es, right after
restart when everything is okey:
os: {
refresh_interval: 1000
cpu: {
vendor: AMD
model: Opteron
mhz: 1900
total_cores: 48
total_sockets: 4
cores_per_socket: 12
cache_size: 512b
cache_size_in_bytes: 512
}
mem: {
total: 126gb
total_in_bytes: 135321870336
}
swap: {
total: 64gb
total_in_bytes: 68803354624
}
}
process: {
refresh_interval: 1000
id: 15926
max_file_descriptors: 128000
}
jvm: {
pid: 15926
version: 1.6..0_27
vm_name: Java HotSpot(TM) 64-Bit Server VM
vm_version: 20.2-b06
vm_vendor: Sun Microsystems Inc.
start_time: 1324275713418
mem: {
heap_init: 100gb
heap_init_in_bytes: 107374182400
heap_max: 99.9gb
heap_max_in_bytes: 107302223872
non_heap_init: 23.1mb
non_heap_init_in_bytes: 24313856
non_heap_max: 130mb
non_heap_max_in_bytes: 136314880
}
}
Any suggestions?
Tnx,
Alex
On Tue, Dec 13, 2011 at 11:23 PM, Shay Banon kimchy@gmail.com
wrote:
Also, I would add that you make sure to enable mlockall in the
configuration to make sure the OS will not swap the elasticsearch
process. I never ran ES with 100gb of memory, whats your typical
memory usage? (node stats can give you a lot of information, also on
the jvm level).
That's some machine!
Yeah, with stutters like this it's very useful to know
what's going on with your heap (and other resources). You can watch
the heap via JConsole or use some monitoring tool like traverse to
fire emails when the heap gets too big or simply be able to view
historical graphs of all your JMX exposed variables.
To enable JMX it looks like you need jmx.create_connector:
true in your elasticsearch.yml.
VisualVM also has an awesome Visual GC plugin that lets
you see which of the various sections of your Heap are filling up.
Cheers,
Paul.
On Tue, Dec 13, 2011 at 12:54 AM, Sisu Alexandru
Yes, the machine has 128 GB , 48 cores. And till now
we didn't have no problems.
Anyhow, yesterday I restarted the machine, and
everything got back to normal.
Still I'm interesting in how to configure the jmx for
es. Any tips?
On Mon, Dec 12, 2011 at 11:43 PM, Shay Banon kimchy@gmail.com wrote:
You configured ES with 100g? Do you have a machine
with a 100gb of memory? How much memory does your machine has? I am
surprise that it even started.... (swap might be really big).
Xmx set to 100G.
That should've been enough.
Monitoring though JMX sounds a good ideea. but
how can you configure jmx options in es?
The only documentation that I found was here Elasticsearch Platform — Find real-time answers at scale | Elastic but its
not very clear to me what should I do.
Tnx,
Alex
minutes.. That means: it suddenly stopped handling search requests
and also status requests:curl -XGET http://localhost:9200/_status.
ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors.
OS: centos 5.
Here is the gist: jstack elastic_search · GitHub
This info was obtained by jstack.
(jstack -F 27029 (pid of es)). It looks like all the threads got
blocked (all 132 )?!
The size of the jstack output is much
more bigger I've copy pasted only some parts of it.
Any ideas?
Tnx in advance,
Alex
--
Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy
--
---------------------------------------------
Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy
Furthermore, if ES works on Java 7, I would try to use it, and
simplify the command line by removing firsts Xms Xmx and all the XX
parameters. It will certainly not solve the OOM, but the new G1 GC
may be more efficient on large heap size.
I will let other people answer on code specific, I'm not a java dev.
Tnx for the prompt answer.
I run es with in foreground so that I can see the eventual OOM. And indeed
it turned out to be an OOM:
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid29451.hprof ...
Exception in thread "elasticsearch[search]-pool-3-thread-18"
java.lang.OutOfMemoryError: Java heap space
at
org.elasticsearch.index.field.data.support.FieldDataLoader.load(FieldDataLoader.java:61)
at
org.elasticsearch.index.field.data.strings.StringFieldData.load(StringFieldData.java:84)
at
org.elasticsearch.index.field.data.strings.StringFieldDataType.load(StringFieldDataType.java:52)
at
org.elasticsearch.index.field.data.strings.StringFieldDataType.load(StringFieldDataType.java:34)
at org.elasticsearch.index.field.data.FieldData.load(FieldData.java:110)
at
org.elasticsearch.index.cache.field.data.support.AbstractConcurrentMapFieldDataCache.cache(AbstractConcurrentMapFieldDataCache.java:119)
at
org.elasticsearch.search.facet.terms.strings.TermsStringOrdinalsFacetCollector.doSetNextReader(TermsStringOrdinalsFacetCollector.java:127)
at
org.elasticsearch.search.facet.AbstractFacetCollector.setNextReader(AbstractFacetCollector.java:71)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:576)
at
org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:199)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:383)
currently is 70GB and is increasing (slower). Probably it will reach 100
gb.
What I did right after this was to run on my development machine the same
query on a smaller dataset.
It seems that ResidentFieldDataCache that extends the
(AbstractConcurrentMapFieldDataCache) doesnt get cleared? (I tried to put a
breakpoint on clear method , and it never gets called).
On the other hand, I also didn't configured my index with no caching
options.
I'm working now on getting the thread dumps and the verbose output of gc.
It seems like you are trying to get terms on a field (body) that has many
of those (guessing by the name of it), resulting in the excessive memory
usage and OOM. The terms facet is not designed to be used on fields with
many terms.
The es continues to hangs. I can reproduce the problem again and again, by
executing the following query:
{ "facets": { "term_count": { "global": true, "terms": {
"field": "body", "size": 100 } } }, "size": 0}
(I'm trying to retrieve the most common 100 terms for this field).
To my surprise, I've discovered in the es folder, a set of huge files:*
java_pid_xxx.hprof*. Generated each time my es got* 'blocked'.*
Now, this files are generated when the process runs out of memory.
I'm running with 100G of heap memory allocated, and I expect that for
large memory consuming operations the memory to be swapped.
Anyhow, it seems that on my machine, each time I'm running the above
query, I'm killing it.
through JConsole, each time I'm executing this query, I can see how
the heap memory increases !very fast! it seems that 100G are consumed in a
few seconds.
regarding the node stats and cluster stats es provides: I cannot access
them after es dies, but here is the info of es, right after restart when
everything is okey:
os: {
refresh_interval: 1000
cpu: {
vendor: AMD
model: Opteron
mhz: 1900
total_cores: 48
total_sockets: 4
cores_per_socket: 12
cache_size: 512b
cache_size_in_bytes: 512
}
mem: {
total: 126gb
total_in_bytes: 135321870336
}
swap: {
total: 64gb
total_in_bytes: 68803354624
}
}
process: {
refresh_interval: 1000
id: 15926
max_file_descriptors: 128000
}
jvm: {
pid: 15926
version: 1.6.0_27
vm_name: Java HotSpot(TM) 64-Bit Server VM
vm_version: 20.2-b06
vm_vendor: Sun Microsystems Inc.
start_time: 1324275713418
mem: {
heap_init: 100gb
heap_init_in_bytes: 107374182400
heap_max: 99.9gb
heap_max_in_bytes: 107302223872
non_heap_init: 23.1mb
non_heap_init_in_bytes: 24313856
non_heap_max: 130mb
non_heap_max_in_bytes: 136314880
}
}
Any suggestions?
Tnx,
Alex
On Tue, Dec 13, 2011 at 11:23 PM, Shay Banon kimchy@gmail.com wrote:
Also, I would add that you make sure to enable mlockall in the
configuration to make sure the OS will not swap the elasticsearch process.
I never ran ES with 100gb of memory, whats your typical memory usage? (node
stats can give you a lot of information, also on the jvm level).
Yeah, with stutters like this it's very useful to know what's going on
with your heap (and other resources). You can watch the heap via JConsole
or use some monitoring tool like traverse to fire emails when the heap gets
too big or simply be able to view historical graphs of all your JMX exposed
variables.
To enable JMX it looks like you need jmx.create_connector: true in your
elasticsearch.yml.
VisualVM also has an awesome Visual GC plugin that lets you see which of
the various sections of your Heap are filling up.
Yes, the machine has 128 GB , 48 cores. And till now we didn't have no
problems.
Anyhow, yesterday I restarted the machine, and everything got back to
normal.
Still I'm interesting in how to configure the jmx for es. Any tips?
On Mon, Dec 12, 2011 at 11:43 PM, Shay Banon kimchy@gmail.com wrote:
You configured ES with 100g? Do you have a machine with a 100gb of
memory? How much memory does your machine has? I am surprise that it even
started... (swap might be really big).
Garbage Collection? How much memory are you giving each JVM? If it's
a large amount and you haven't tuned your GC options on the JVM, this is a
likely cause.
I don't suppose you had something monitoring JMX over that time
period. If you did you'd be able to see if this was the issue if you notice
the Heap Space Used dropping off.
Our es suddenly got blocked for 15 minutes. That means: it
suddenly stopped handling search requests and also status requests:curl
-XGET http://localhost:9200/_status.
ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors.
OS: centos 5.
Here is the gist: jstack elastic_search · GitHub
This info was obtained by jstack. (jstack -F 27029 (pid of es)). It
looks like all the threads got blocked (all 132 )?!
The size of the jstack output is much more bigger I've copy pasted
only some parts of it.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.