Es got blocked


(Sisu Alexandru) #1

Hello all,

Our es suddenly got blocked for 15 minutes. That means: it suddenly
stopped handling search requests and also status requests:curl -XGET
http://localhost:9200/_status.

ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors.
OS: centos 5.

Here is the gist: https://gist.github.com/1467971
This info was obtained by jstack. (jstack -F 27029 (pid of es)). It looks
like all the threads got blocked (all 132 )?!

The size of the jstack output is much more bigger I've copy pasted only
some parts of it.

Any ideas?

Tnx in advance,

Alex


(Paul Loy) #2

Garbage Collection? How much memory are you giving each JVM? If it's a
large amount and you haven't tuned your GC options on the JVM, this is a
likely cause.

I don't suppose you had something monitoring JMX over that time period. If
you did you'd be able to see if this was the issue if you notice the Heap
Space Used dropping off.

Paul.

On Mon, Dec 12, 2011 at 8:00 AM, Sisu Alexandru sisu.eugen@gmail.comwrote:

Hello all,

Our es suddenly got blocked for 15 minutes. That means: it suddenly
stopped handling search requests and also status requests:curl -XGET
http://localhost:9200/_status.

ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors.
OS: centos 5.

Here is the gist: https://gist.github.com/1467971
This info was obtained by jstack. (jstack -F 27029 (pid of es)). It looks
like all the threads got blocked (all 132 )?!

The size of the jstack output is much more bigger I've copy pasted only
some parts of it.

Any ideas?

Tnx in advance,

Alex

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(Sisu Alexandru) #3

Well I started up elastic search with Xms and Xmx set to 100G.
That should've been enough.

Monitoring though JMX sounds a good ideea. but how can you configure jmx
options in es?
The only documentation that I found was here
http://www.elasticsearch.org/guide/reference/modules/jmx.html but its not
very clear to me what should I do.

Tnx,

Alex

On Mon, Dec 12, 2011 at 5:40 PM, Paul Loy keteracel@gmail.com wrote:

Garbage Collection? How much memory are you giving each JVM? If it's a
large amount and you haven't tuned your GC options on the JVM, this is a
likely cause.

I don't suppose you had something monitoring JMX over that time period. If
you did you'd be able to see if this was the issue if you notice the Heap
Space Used dropping off.

Paul.

On Mon, Dec 12, 2011 at 8:00 AM, Sisu Alexandru sisu.eugen@gmail.comwrote:

Hello all,

Our es suddenly got blocked for 15 minutes. That means: it suddenly
stopped handling search requests and also status requests:curl -XGET
http://localhost:9200/_status.

ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors.
OS: centos 5.

Here is the gist: https://gist.github.com/1467971
This info was obtained by jstack. (jstack -F 27029 (pid of es)). It looks
like all the threads got blocked (all 132 )?!

The size of the jstack output is much more bigger I've copy pasted only
some parts of it.

Any ideas?

Tnx in advance,

Alex

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(Shay Banon) #4

You configured ES with 100g? Do you have a machine with a 100gb of memory?
How much memory does your machine has? I am surprise that it even
started... (swap might be really big).

On Mon, Dec 12, 2011 at 11:12 PM, Sisu Alexandru sisu.eugen@gmail.comwrote:

Well I started up elastic search with Xms and Xmx set to 100G.
That should've been enough.

Monitoring though JMX sounds a good ideea. but how can you configure jmx
options in es?
The only documentation that I found was here
http://www.elasticsearch.org/guide/reference/modules/jmx.html but its not
very clear to me what should I do.

Tnx,

Alex

On Mon, Dec 12, 2011 at 5:40 PM, Paul Loy keteracel@gmail.com wrote:

Garbage Collection? How much memory are you giving each JVM? If it's a
large amount and you haven't tuned your GC options on the JVM, this is a
likely cause.

I don't suppose you had something monitoring JMX over that time period.
If you did you'd be able to see if this was the issue if you notice the
Heap Space Used dropping off.

Paul.

On Mon, Dec 12, 2011 at 8:00 AM, Sisu Alexandru sisu.eugen@gmail.comwrote:

Hello all,

Our es suddenly got blocked for 15 minutes. That means: it suddenly
stopped handling search requests and also status requests:curl -XGET
http://localhost:9200/_status.

ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors.
OS: centos 5.

Here is the gist: https://gist.github.com/1467971
This info was obtained by jstack. (jstack -F 27029 (pid of es)). It
looks like all the threads got blocked (all 132 )?!

The size of the jstack output is much more bigger I've copy pasted only
some parts of it.

Any ideas?

Tnx in advance,

Alex

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(Sisu Alexandru) #5

Yes, the machine has 128 GB , 48 cores. And till now we didn't have no
problems.
Anyhow, yesterday I restarted the machine, and everything got back to
normal.
Still I'm interesting in how to configure the jmx for es. Any tips? :slight_smile:

On Mon, Dec 12, 2011 at 11:43 PM, Shay Banon kimchy@gmail.com wrote:

You configured ES with 100g? Do you have a machine with a 100gb of memory?
How much memory does your machine has? I am surprise that it even
started... (swap might be really big).

On Mon, Dec 12, 2011 at 11:12 PM, Sisu Alexandru sisu.eugen@gmail.comwrote:

Well I started up elastic search with Xms and Xmx set to 100G.
That should've been enough.

Monitoring though JMX sounds a good ideea. but how can you configure jmx
options in es?
The only documentation that I found was here
http://www.elasticsearch.org/guide/reference/modules/jmx.html but its
not very clear to me what should I do.

Tnx,

Alex

On Mon, Dec 12, 2011 at 5:40 PM, Paul Loy keteracel@gmail.com wrote:

Garbage Collection? How much memory are you giving each JVM? If it's a
large amount and you haven't tuned your GC options on the JVM, this is a
likely cause.

I don't suppose you had something monitoring JMX over that time period.
If you did you'd be able to see if this was the issue if you notice the
Heap Space Used dropping off.

Paul.

On Mon, Dec 12, 2011 at 8:00 AM, Sisu Alexandru sisu.eugen@gmail.comwrote:

Hello all,

Our es suddenly got blocked for 15 minutes. That means: it suddenly
stopped handling search requests and also status requests:curl -XGET
http://localhost:9200/_status.

ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors.
OS: centos 5.

Here is the gist: https://gist.github.com/1467971
This info was obtained by jstack. (jstack -F 27029 (pid of es)). It
looks like all the threads got blocked (all 132 )?!

The size of the jstack output is much more bigger I've copy pasted
only some parts of it.

Any ideas?

Tnx in advance,

Alex

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(Paul Loy) #6

That's some machine!

Yeah, with stutters like this it's very useful to know what's going on with
your heap (and other resources). You can watch the heap via JConsole or use
some monitoring tool like traverse to fire emails when the heap gets too
big or simply be able to view historical graphs of all your JMX exposed
variables.

To enable JMX it looks like you need jmx.create_connector: true in your
elasticsearch.yml.

VisualVM also has an awesome Visual GC plugin that lets you see which of
the various sections of your Heap are filling up.

Cheers,

Paul.

On Tue, Dec 13, 2011 at 12:54 AM, Sisu Alexandru sisu.eugen@gmail.comwrote:

Yes, the machine has 128 GB , 48 cores. And till now we didn't have no
problems.
Anyhow, yesterday I restarted the machine, and everything got back to
normal.
Still I'm interesting in how to configure the jmx for es. Any tips? :slight_smile:

On Mon, Dec 12, 2011 at 11:43 PM, Shay Banon kimchy@gmail.com wrote:

You configured ES with 100g? Do you have a machine with a 100gb of
memory? How much memory does your machine has? I am surprise that it even
started... (swap might be really big).

On Mon, Dec 12, 2011 at 11:12 PM, Sisu Alexandru sisu.eugen@gmail.comwrote:

Well I started up elastic search with Xms and Xmx set to 100G.
That should've been enough.

Monitoring though JMX sounds a good ideea. but how can you configure jmx
options in es?
The only documentation that I found was here
http://www.elasticsearch.org/guide/reference/modules/jmx.html but its
not very clear to me what should I do.

Tnx,

Alex

On Mon, Dec 12, 2011 at 5:40 PM, Paul Loy keteracel@gmail.com wrote:

Garbage Collection? How much memory are you giving each JVM? If it's a
large amount and you haven't tuned your GC options on the JVM, this is a
likely cause.

I don't suppose you had something monitoring JMX over that time period.
If you did you'd be able to see if this was the issue if you notice the
Heap Space Used dropping off.

Paul.

On Mon, Dec 12, 2011 at 8:00 AM, Sisu Alexandru sisu.eugen@gmail.comwrote:

Hello all,

Our es suddenly got blocked for 15 minutes. That means: it suddenly
stopped handling search requests and also status requests:curl -XGET
http://localhost:9200/_status.

ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors.
OS: centos 5.

Here is the gist: https://gist.github.com/1467971
This info was obtained by jstack. (jstack -F 27029 (pid of es)). It
looks like all the threads got blocked (all 132 )?!

The size of the jstack output is much more bigger I've copy pasted
only some parts of it.

Any ideas?

Tnx in advance,

Alex

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(Shay Banon) #7

Also, I would add that you make sure to enable mlockall in the
configuration to make sure the OS will not swap the elasticsearch process.
I never ran ES with 100gb of memory, whats your typical memory usage? (node
stats can give you a lot of information, also on the jvm level).

On Tue, Dec 13, 2011 at 12:17 PM, Paul Loy keteracel@gmail.com wrote:

That's some machine!

Yeah, with stutters like this it's very useful to know what's going on
with your heap (and other resources). You can watch the heap via JConsole
or use some monitoring tool like traverse to fire emails when the heap gets
too big or simply be able to view historical graphs of all your JMX exposed
variables.

To enable JMX it looks like you need jmx.create_connector: true in your
elasticsearch.yml.

VisualVM also has an awesome Visual GC plugin that lets you see which of
the various sections of your Heap are filling up.

Cheers,

Paul.

On Tue, Dec 13, 2011 at 12:54 AM, Sisu Alexandru sisu.eugen@gmail.comwrote:

Yes, the machine has 128 GB , 48 cores. And till now we didn't have no
problems.
Anyhow, yesterday I restarted the machine, and everything got back to
normal.
Still I'm interesting in how to configure the jmx for es. Any tips? :slight_smile:

On Mon, Dec 12, 2011 at 11:43 PM, Shay Banon kimchy@gmail.com wrote:

You configured ES with 100g? Do you have a machine with a 100gb of
memory? How much memory does your machine has? I am surprise that it even
started... (swap might be really big).

On Mon, Dec 12, 2011 at 11:12 PM, Sisu Alexandru sisu.eugen@gmail.comwrote:

Well I started up elastic search with Xms and Xmx set to 100G.
That should've been enough.

Monitoring though JMX sounds a good ideea. but how can you configure
jmx options in es?
The only documentation that I found was here
http://www.elasticsearch.org/guide/reference/modules/jmx.html but its
not very clear to me what should I do.

Tnx,

Alex

On Mon, Dec 12, 2011 at 5:40 PM, Paul Loy keteracel@gmail.com wrote:

Garbage Collection? How much memory are you giving each JVM? If it's a
large amount and you haven't tuned your GC options on the JVM, this is a
likely cause.

I don't suppose you had something monitoring JMX over that time
period. If you did you'd be able to see if this was the issue if you notice
the Heap Space Used dropping off.

Paul.

On Mon, Dec 12, 2011 at 8:00 AM, Sisu Alexandru sisu.eugen@gmail.comwrote:

Hello all,

Our es suddenly got blocked for 15 minutes. That means: it suddenly
stopped handling search requests and also status requests:curl -XGET
http://localhost:9200/_status.

ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors.
OS: centos 5.

Here is the gist: https://gist.github.com/1467971
This info was obtained by jstack. (jstack -F 27029 (pid of es)). It
looks like all the threads got blocked (all 132 )?!

The size of the jstack output is much more bigger I've copy pasted
only some parts of it.

Any ideas?

Tnx in advance,

Alex

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(Sisu Alexandru) #8

Hello all,

Following your suggestion I've tried:

  • running with bootstrap.mlockall set to True.
  • i've enabled the JMX monitoring.

The es continues to hangs. I can reproduce the problem again and again, by
executing the following query:
{ "facets": { "term_count": { "global": true, "terms": {
"field": "body", "size": 100 } } }, "size": 0}

(I'm trying to retrieve the most common 100 terms for this field).

To my surprise, I've discovered in the es folder, a set of huge files:*
java_pid_xxx.hprof*. Generated each time my es got* 'blocked'.*
Now, this files are generated when the process runs out of memory.

I'm running with 100G of heap memory allocated, and I expect that for large
memory consuming operations the memory to be swapped.

Anyhow, it seems that on my machine, each time I'm running the above query,
I'm killing it.

Other informations:

  • the jhat heap histogram can be found here: https://gist.github.com/1497379
  • through JConsole, each time I'm executing this query, I can see how the
    heap memory increases !very fast! it seems that 100G are consumed in a few
    seconds.
  • regarding the node stats and cluster stats es provides: I cannot access
    them after es dies, but here is the info of es, right after restart when
    everything is okey:
    os: {
    refresh_interval: 1000
    cpu: {
    vendor: AMD
    model: Opteron
    mhz: 1900
    total_cores: 48
    total_sockets: 4
    cores_per_socket: 12
    cache_size: 512b
    cache_size_in_bytes: 512
    }
    mem: {
    total: 126gb
    total_in_bytes: 135321870336
    }
    swap: {
    total: 64gb
    total_in_bytes: 68803354624
    }
    }
    process: {
    refresh_interval: 1000
    id: 15926
    max_file_descriptors: 128000
    }
    jvm: {
    pid: 15926
    version: 1.6.0_27
    vm_name: Java HotSpot(TM) 64-Bit Server VM
    vm_version: 20.2-b06
    vm_vendor: Sun Microsystems Inc.
    start_time: 1324275713418
    mem: {
    heap_init: 100gb
    heap_init_in_bytes: 107374182400
    heap_max: 99.9gb
    heap_max_in_bytes: 107302223872
    non_heap_init: 23.1mb
    non_heap_init_in_bytes: 24313856
    non_heap_max: 130mb
    non_heap_max_in_bytes: 136314880
    }
    }

Any suggestions?

Tnx,
Alex

On Tue, Dec 13, 2011 at 11:23 PM, Shay Banon kimchy@gmail.com wrote:

Also, I would add that you make sure to enable mlockall in the
configuration to make sure the OS will not swap the elasticsearch process.
I never ran ES with 100gb of memory, whats your typical memory usage? (node
stats can give you a lot of information, also on the jvm level).

On Tue, Dec 13, 2011 at 12:17 PM, Paul Loy keteracel@gmail.com wrote:

That's some machine!

Yeah, with stutters like this it's very useful to know what's going on
with your heap (and other resources). You can watch the heap via JConsole
or use some monitoring tool like traverse to fire emails when the heap gets
too big or simply be able to view historical graphs of all your JMX exposed
variables.

To enable JMX it looks like you need jmx.create_connector: true in your
elasticsearch.yml.

VisualVM also has an awesome Visual GC plugin that lets you see which of
the various sections of your Heap are filling up.

Cheers,

Paul.

On Tue, Dec 13, 2011 at 12:54 AM, Sisu Alexandru sisu.eugen@gmail.comwrote:

Yes, the machine has 128 GB , 48 cores. And till now we didn't have no
problems.
Anyhow, yesterday I restarted the machine, and everything got back to
normal.
Still I'm interesting in how to configure the jmx for es. Any tips? :slight_smile:

On Mon, Dec 12, 2011 at 11:43 PM, Shay Banon kimchy@gmail.com wrote:

You configured ES with 100g? Do you have a machine with a 100gb of
memory? How much memory does your machine has? I am surprise that it even
started... (swap might be really big).

On Mon, Dec 12, 2011 at 11:12 PM, Sisu Alexandru sisu.eugen@gmail.comwrote:

Well I started up elastic search with Xms and Xmx set to 100G.
That should've been enough.

Monitoring though JMX sounds a good ideea. but how can you configure
jmx options in es?
The only documentation that I found was here
http://www.elasticsearch.org/guide/reference/modules/jmx.html but its
not very clear to me what should I do.

Tnx,

Alex

On Mon, Dec 12, 2011 at 5:40 PM, Paul Loy keteracel@gmail.com wrote:

Garbage Collection? How much memory are you giving each JVM? If it's
a large amount and you haven't tuned your GC options on the JVM, this is a
likely cause.

I don't suppose you had something monitoring JMX over that time
period. If you did you'd be able to see if this was the issue if you notice
the Heap Space Used dropping off.

Paul.

On Mon, Dec 12, 2011 at 8:00 AM, Sisu Alexandru <sisu.eugen@gmail.com

wrote:

Hello all,

Our es suddenly got blocked for 15 minutes. That means: it
suddenly stopped handling search requests and also status requests:curl
-XGET http://localhost:9200/_status.

ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors.
OS: centos 5.

Here is the gist: https://gist.github.com/1467971
This info was obtained by jstack. (jstack -F 27029 (pid of es)). It
looks like all the threads got blocked (all 132 )?!

The size of the jstack output is much more bigger I've copy pasted
only some parts of it.

Any ideas?

Tnx in advance,

Alex

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(Aurélien-2) #9

Hello.

hprof files are generated by - XX:+HeapDumpOnOutOfMemoryError parameter. The files should be roughly same size as heap size when OOM occurs, but could you give us the size of those files?

jhat profiles does not show a 100GB used heap, but do you have any OOM error in log files? I don't know if you modified launch scripts, but theses errors should be redirected via stderr. Do you have a huge CPU consommation on 1 or more CPU during the "es blocked" situation?

It should be interesting to:

  • redirect stderr and stdout to a file if it's not done already (with &> /path/to/file.log at the end of java command file)
  • activate verbose gc to a specific file
  • launch visualvm with visualgc plugin
  • launch recurrent thread dumps (with kill -3 ) within short time period (one thread dump every 10 or 15 seconds)

and then reproduce the problem asap to avoid generating to much logs.

If you have an OOM, you will wich generation is full with the verbosegc and visualgc, if you have a problem with threads you will see them in thread dumps (I use samurai http://yusuke.homeip.net/samurai/ to analyze thread dumps), MAT to analyze heap dump/hprof files (but huge hprof should be difficult to read, maybe with jhat). Samurai can read verbosegc files too.

In any case, try to use a more recent JVM, please send the complete command line of your ES, and maybe try another JDK (like jrockit with either generation/non generational GC).

100GB is a quite huge memory for actual GC, see http://www.infoq.com/presentations/Understanding-Java-Garbage-Collection

Rgds.
----- Mail original -----

De: "Sisu Alexandru" sisu.eugen@gmail.com
À: elasticsearch@googlegroups.com
Envoyé: Lundi 19 Décembre 2011 15:11:30
Objet: Re: es got blocked

Hello all,

Following your suggestion I've tried:

  • running with bootstrap.mlockall set to True.
  • i've enabled the JMX monitoring.

The es continues to hangs. I can reproduce the problem again and
again, by executing the following query:
{ "facets": { "term_count": { "global": true, "terms": { "field":
"body", "size": 100 } } }, "size": 0}

(I'm trying to retrieve the most common 100 terms for this field).

To my surprise, I've discovered in the es folder, a set of huge
files: java_pid_ xxx .hprof .. Generated each time my es got
'blocked'.
Now, this files are generated when the process runs out of memory.

I'm running with 100G of heap memory allocated, and I expect that for
large memory consuming operations the memory to be swapped.

Anyhow, it seems that on my machine, each time I'm running the above
query, I'm killing it.

Other informations:

  • the jhat heap histogram can be found here:
    https://gist.github.com/1497379
  • through JConsole , each time I'm executing this query, I can see
    how the heap memory increases !very fast! it seems that 100G are
    consumed in a few seconds.
  • regarding the node stats and cluster stats es provides: I cannot
    access them after es dies, but here is the info of es, right after
    restart when everything is okey:

os: {
refresh_interval: 1000
cpu: {
vendor: AMD
model: Opteron
mhz: 1900
total_cores: 48
total_sockets: 4
cores_per_socket: 12
cache_size: 512b
cache_size_in_bytes: 512
}
mem: {
total: 126gb
total_in_bytes: 135321870336
}
swap: {
total: 64gb
total_in_bytes: 68803354624
}
}
process: {
refresh_interval: 1000
id: 15926
max_file_descriptors: 128000
}
jvm: {
pid: 15926
version: 1.6..0_27
vm_name: Java HotSpot(TM) 64-Bit Server VM
vm_version: 20.2-b06
vm_vendor: Sun Microsystems Inc.
start_time: 1324275713418
mem: {
heap_init: 100gb
heap_init_in_bytes: 107374182400
heap_max: 99.9gb
heap_max_in_bytes: 107302223872
non_heap_init: 23.1mb
non_heap_init_in_bytes: 24313856
non_heap_max: 130mb
non_heap_max_in_bytes: 136314880
}
}

Any suggestions?

Tnx,
Alex

On Tue, Dec 13, 2011 at 11:23 PM, Shay Banon < kimchy@gmail.com >
wrote:

Also, I would add that you make sure to enable mlockall in the
configuration to make sure the OS will not swap the elasticsearch
process. I never ran ES with 100gb of memory, whats your typical
memory usage? (node stats can give you a lot of information, also
on
the jvm level).

On Tue, Dec 13, 2011 at 12:17 PM, Paul Loy < keteracel@gmail.com >
wrote:

That's some machine!

Yeah, with stutters like this it's very useful to know what's
going
on with your heap (and other resources). You can watch the heap
via
JConsole or use some monitoring tool like traverse to fire emails
when the heap gets too big or simply be able to view historical
graphs of all your JMX exposed variables.

To enable JMX it looks like you need jmx.create_connector: true
in
your elasticsearch.yml .

VisualVM also has an awesome Visual GC plugin that lets you see
which
of the various sections of your Heap are filling up.

Cheers,

Paul.

On Tue, Dec 13, 2011 at 12:54 AM, Sisu Alexandru <
sisu.eugen@gmail.com > wrote:

Yes, the machine has 128 GB , 48 cores. And till now we didn't
have
no problems.

Anyhow, yesterday I restarted the machine, and everything got
back
to
normal.

Still I'm interesting in how to configure the jmx for es. Any
tips?
:slight_smile:

On Mon, Dec 12, 2011 at 11:43 PM, Shay Banon < kimchy@gmail.com

wrote:

You configured ES with 100g? Do you have a machine with a
100gb
of
memory? How much memory does your machine has? I am surprise
that
it
even started.... (swap might be really big).

On Mon, Dec 12, 2011 at 11:12 PM, Sisu Alexandru <
sisu.eugen@gmail.com > wrote:

Well I started up elastic search with Xms and Xmx set to
100G.

That should've been enough.

Monitoring though JMX sounds a good ideea. but how can you
configure
jmx options in es?

The only documentation that I found was here
http://www.elasticsearch.org/guide/reference/modules/jmx.html
but
its not very clear to me what should I do.

Tnx,

Alex

On Mon, Dec 12, 2011 at 5:40 PM, Paul Loy <
keteracel@gmail.com

wrote:

Garbage Collection? How much memory are you giving each
JVM?
If
it's
a large amount and you haven't tuned your GC options on
the
JVM,
this is a likely cause.

I don't suppose you had something monitoring JMX over
that
time
period. If you did you'd be able to see if this was the
issue
if
you
notice the Heap Space Used dropping off.

Paul.

On Mon, Dec 12, 2011 at 8:00 AM, Sisu Alexandru <
sisu.eugen@gmail.com > wrote:

Hello all,

Our es suddenly got blocked for 15 minutes.. That
means:
it
suddenly
stopped handling search requests and also status
requests:curl
-XGET
http://localhost:9200/_status .

ES version: 0.18.5

The size of the index is arround 200G.

One ES client.

20 shards. All on one single machine.

No mirrors.

OS: centos 5.

Here is the gist: https://gist.github.com/1467971

This info was obtained by jstack. (jstack -F 27029 (pid
of
es)).
It
looks like all the threads got blocked (all 132 )?!

The size of the jstack output is much more bigger I've
copy
pasted
only some parts of it.

Any ideas?

Tnx in advance,

Alex

--


Paul Loy

paul@keteracel.com

http://uk.linkedin.com/in/paulloy

--


Paul Loy

paul@keteracel.com

http://uk.linkedin.com/in/paulloy


(Aurélien-2) #10

Hello.

hprof files are generated by -XX:+HeapDumpOnOutOfMemoryError
parameter. The files should be roughly same size as heap size when OOM
occurs, but could you give us the size of those files?

jhat profiles does not show a 100GB used heap, but do you have any OOM
error in log files? I don't know if you modified launch scripts, but
theses errors should be redirected via stderr. Do you have a huge CPU
consommation on 1 or more CPU during the "es blocked" situation?

It should be interesting to:

  • redirect stderr and stdout to a file if it's not done already (with
    &> /path/to/file.log at the end of java command file)
  • activate verbose gc to a specific file
  • launch visualvm with visualgc plugin
  • launch recurrent thread dumps (with kill -3 ) within short time
    period (one thread dump every 10 or 15 seconds)

and then reproduce the problem asap to avoid generating to much logs.

If you have an OOM, you will wich generation is full with the
verbosegc and visualgc, if you have a problem with threads you will
see them in thread dumps (I use samurai http://yusuke.homeip.net/samurai/
to analyze thread dumps), MAT to analyze heap dump/hprof files (but
huge hprof should be difficult to read, maybe with jhat). Samurai can
read verbosegc files too.

In any case, try to use a more recent JVM, please send the complete
command line of your ES, and maybe try another JDK (like jrockit with
either generation/non generational GC).

100GB is a quite huge memory for actual GC, see

Rgds.

De: "Sisu Alexandru" <sisu.eugen@gmail.com>
À: elasticsearch@googlegroups.com
Envoyé: Lundi 19 Décembre 2011 15:11:30
Objet: Re: es got blocked

Hello all,
Following your suggestion I've tried:
- running with bootstrap.mlockall set to True.
- i've enabled the JMX monitoring.
The es continues to hangs. I can reproduce the problem again and

again, by executing the following query:
{ "facets": { "term_count": { "global": true,
"terms": { "field": "body", "size": 100 } } },
"size": 0}
(I'm trying to retrieve the most common 100 terms for this field).
To my surprise, I've discovered in the es folder, a set of huge
files: java_pid_xxx.hprof.. Generated each time my es got 'blocked'.
Now, this files are generated when the process runs out of
memory.
I'm running with 100G of heap memory allocated, and I expect that
for large memory consuming operations the memory to be swapped.
Anyhow, it seems that on my machine, each time I'm running the
above query, I'm killing it.
Other informations:
- the jhat heap histogram can be found here: https://gist.github.com/1497379
- through JConsole, each time I'm executing this query, I can see
how the heap memory increases !very fast! it seems that 100G are
consumed in a few seconds.
- regarding the node stats and cluster stats es provides: I cannot
access them after es dies, but here is the info of es, right after
restart when everything is okey:
os: {
refresh_interval: 1000
cpu: {
vendor: AMD
model: Opteron
mhz: 1900
total_cores: 48
total_sockets: 4
cores_per_socket: 12
cache_size: 512b
cache_size_in_bytes: 512
}
mem: {
total: 126gb
total_in_bytes: 135321870336
}
swap: {
total: 64gb
total_in_bytes: 68803354624
}
}
process: {
refresh_interval: 1000
id: 15926
max_file_descriptors: 128000
}
jvm: {
pid: 15926
version: 1.6..0_27
vm_name: Java HotSpot(TM) 64-Bit Server VM
vm_version: 20.2-b06
vm_vendor: Sun Microsystems Inc.
start_time: 1324275713418
mem: {
heap_init: 100gb
heap_init_in_bytes: 107374182400
heap_max: 99.9gb
heap_max_in_bytes: 107302223872
non_heap_init: 23.1mb
non_heap_init_in_bytes: 24313856
non_heap_max: 130mb
non_heap_max_in_bytes: 136314880
}
}
Any suggestions?
Tnx,
Alex
On Tue, Dec 13, 2011 at 11:23 PM, Shay Banon kimchy@gmail.com
wrote:

    Also, I would add that you make sure to enable mlockall in the

configuration to make sure the OS will not swap the elasticsearch
process. I never ran ES with 100gb of memory, whats your typical
memory usage? (node stats can give you a lot of information, also on
the jvm level).

    On Tue, Dec 13, 2011 at 12:17 PM, Paul Loy

keteracel@gmail.com wrote:

        That's some machine!

        Yeah, with stutters like this it's very useful to know

what's going on with your heap (and other resources). You can watch
the heap via JConsole or use some monitoring tool like traverse to
fire emails when the heap gets too big or simply be able to view
historical graphs of all your JMX exposed variables.

        To enable JMX it looks like you need jmx.create_connector:

true in your elasticsearch.yml.

        VisualVM also has an awesome Visual GC plugin that lets

you see which of the various sections of your Heap are filling up.

        Cheers,

        Paul.



        On Tue, Dec 13, 2011 at 12:54 AM, Sisu Alexandru

sisu.eugen@gmail.com wrote:

            Yes, the machine has 128 GB , 48 cores. And till now

we didn't have no problems.
Anyhow, yesterday I restarted the machine, and
everything got back to normal.
Still I'm interesting in how to configure the jmx for
es. Any tips? :slight_smile:
On Mon, Dec 12, 2011 at 11:43 PM, Shay Banon
kimchy@gmail.com wrote:

                You configured ES with 100g? Do you have a machine

with a 100gb of memory? How much memory does your machine has? I am
surprise that it even started.... (swap might be really big).

                On Mon, Dec 12, 2011 at 11:12 PM, Sisu Alexandru

sisu.eugen@gmail.com wrote:

                    Well I started up elastic search with Xms and

Xmx set to 100G.
That should've been enough.
Monitoring though JMX sounds a good ideea. but
how can you configure jmx options in es?
The only documentation that I found was here
http://www.elasticsearch.org/guide/reference/modules/jmx.html but its
not very clear to me what should I do.
Tnx,
Alex

                    On Mon, Dec 12, 2011 at 5:40 PM, Paul Loy

keteracel@gmail.com wrote:

                        Garbage Collection? How much memory are

you giving each JVM? If it's a large amount and you haven't tuned your
GC options on the JVM, this is a likely cause.

                        I don't suppose you had something

monitoring JMX over that time period. If you did you'd be able to see
if this was the issue if you notice the Heap Space Used dropping off.

                        Paul.


                        On Mon, Dec 12, 2011 at 8:00 AM, Sisu

Alexandru sisu.eugen@gmail.com wrote:

                            Hello all,
                            Our es suddenly got  blocked for 15

minutes.. That means: it suddenly stopped handling search requests
and also status requests:curl -XGET http://localhost:9200/_status.
ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors.
OS: centos 5.
Here is the gist: https://gist.github.com/1467971
This info was obtained by jstack.
(jstack -F 27029 (pid of es)). It looks like all the threads got
blocked (all 132 )?!
The size of the jstack output is much
more bigger I've copy pasted only some parts of it.
Any ideas?
Tnx in advance,
Alex

                        --

                        Paul Loy
                        paul@keteracel.com
                        http://uk.linkedin.com/in/paulloy




        --
        ---------------------------------------------
        Paul Loy
        paul@keteracel.com
        http://uk.linkedin.com/in/paulloy

(Sisu Alexandru) #11

Hi Aurelien

Tnx for the prompt answer.
I run es with in foreground so that I can see the eventual OOM. And indeed
it turned out to be an OOM:
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid29451.hprof ...
Exception in thread "elasticsearch[search]-pool-3-thread-18"
java.lang.OutOfMemoryError: Java heap space
at
org.elasticsearch.index.field.data.support.FieldDataLoader.load(FieldDataLoader.java:61)
at
org.elasticsearch.index.field.data.strings.StringFieldData.load(StringFieldData.java:84)
at
org.elasticsearch.index.field.data.strings.StringFieldDataType.load(StringFieldDataType.java:52)
at
org.elasticsearch.index.field.data.strings.StringFieldDataType.load(StringFieldDataType.java:34)
at org.elasticsearch.index.field.data.FieldData.load(FieldData.java:110)
at
org.elasticsearch.index.cache.field.data.support.AbstractConcurrentMapFieldDataCache.cache(AbstractConcurrentMapFieldDataCache.java:119)
at
org.elasticsearch.search.facet.terms.strings.TermsStringOrdinalsFacetCollector.doSetNextReader(TermsStringOrdinalsFacetCollector.java:127)
at
org.elasticsearch.search.facet.AbstractFacetCollector.setNextReader(AbstractFacetCollector.java:71)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:576)
at
org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:199)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:383)

Command line params:

home/jdk1.6.0_27/bin/java -Xms256m -Xmx1g -Xss128k -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError
-Delasticsearch -Des.path.home=/home/elasticsearch-0.18.5
-Des-foreground=yes -cp
:/home/elasticsearch-0.18.5/lib/:/home/elasticsearch-0.18.5/lib/sigar/
-Xms100G -Xmx100G org.elasticsearch.bootstrap.ElasticSearch

The dumps size of hprof file:

  • currently is 70GB and is increasing (slower). Probably it will reach 100
    gb.

What I did right after this was to run on my development machine the same
query on a smaller dataset.
It seems that ResidentFieldDataCache that extends the
(AbstractConcurrentMapFieldDataCache) doesnt get cleared? (I tried to put a
breakpoint on clear method , and it never gets called).
On the other hand, I also didn't configured my index with no caching
options.

I'm working now on getting the thread dumps and the verbose output of gc.

On Mon, Dec 19, 2011 at 3:37 PM, Aurélien aurelien.dehay@gmail.com wrote:

Hello.

hprof files are generated by -XX:+HeapDumpOnOutOfMemoryError
parameter. The files should be roughly same size as heap size when OOM
occurs, but could you give us the size of those files?

jhat profiles does not show a 100GB used heap, but do you have any OOM
error in log files? I don't know if you modified launch scripts, but
theses errors should be redirected via stderr. Do you have a huge CPU
consommation on 1 or more CPU during the "es blocked" situation?

It should be interesting to:

  • redirect stderr and stdout to a file if it's not done already (with
    &> /path/to/file.log at the end of java command file)
  • activate verbose gc to a specific file
  • launch visualvm with visualgc plugin
  • launch recurrent thread dumps (with kill -3 ) within short time
    period (one thread dump every 10 or 15 seconds)

and then reproduce the problem asap to avoid generating to much logs.

If you have an OOM, you will wich generation is full with the
verbosegc and visualgc, if you have a problem with threads you will
see them in thread dumps (I use samurai http://yusuke.homeip.net/samurai/
to analyze thread dumps), MAT to analyze heap dump/hprof files (but
huge hprof should be difficult to read, maybe with jhat). Samurai can
read verbosegc files too.

In any case, try to use a more recent JVM, please send the complete
command line of your ES, and maybe try another JDK (like jrockit with
either generation/non generational GC).

100GB is a quite huge memory for actual GC, see
http://www.infoq.com/presentations/Understanding-Java-Garbage-Collection

Rgds.

De: "Sisu Alexandru" sisu.eugen@gmail.com
À: elasticsearch@googlegroups.com
Envoyé: Lundi 19 Décembre 2011 15:11:30
Objet: Re: es got blocked

Hello all,
Following your suggestion I've tried:

  • running with bootstrap.mlockall set to True.

  • i've enabled the JMX monitoring.
    The es continues to hangs. I can reproduce the problem again and
    again, by executing the following query:
    { "facets": { "term_count": { "global": true,
    "terms": { "field": "body", "size": 100 } } },
    "size": 0}
    (I'm trying to retrieve the most common 100 terms for this field).
    To my surprise, I've discovered in the es folder, a set of huge
    files: java_pid_xxx.hprof.. Generated each time my es got 'blocked'.
    Now, this files are generated when the process runs out of
    memory.
    I'm running with 100G of heap memory allocated, and I expect that
    for large memory consuming operations the memory to be swapped.
    Anyhow, it seems that on my machine, each time I'm running the
    above query, I'm killing it.
    Other informations:

  • the jhat heap histogram can be found here:
    https://gist.github.com/1497379

  • through JConsole, each time I'm executing this query, I can see
    how the heap memory increases !very fast! it seems that 100G are
    consumed in a few seconds.

  • regarding the node stats and cluster stats es provides: I cannot
    access them after es dies, but here is the info of es, right after
    restart when everything is okey:
    os: {
    refresh_interval: 1000
    cpu: {
    vendor: AMD
    model: Opteron
    mhz: 1900
    total_cores: 48
    total_sockets: 4
    cores_per_socket: 12
    cache_size: 512b
    cache_size_in_bytes: 512
    }
    mem: {
    total: 126gb
    total_in_bytes: 135321870336
    }
    swap: {
    total: 64gb
    total_in_bytes: 68803354624
    }
    }
    process: {
    refresh_interval: 1000
    id: 15926
    max_file_descriptors: 128000
    }
    jvm: {
    pid: 15926
    version: 1.6..0_27
    vm_name: Java HotSpot(TM) 64-Bit Server VM
    vm_version: 20.2-b06
    vm_vendor: Sun Microsystems Inc.
    start_time: 1324275713418
    mem: {
    heap_init: 100gb
    heap_init_in_bytes: 107374182400
    heap_max: 99.9gb
    heap_max_in_bytes: 107302223872
    non_heap_init: 23.1mb
    non_heap_init_in_bytes: 24313856
    non_heap_max: 130mb
    non_heap_max_in_bytes: 136314880
    }
    }
    Any suggestions?
    Tnx,
    Alex
    On Tue, Dec 13, 2011 at 11:23 PM, Shay Banon kimchy@gmail.com
    wrote:

    Also, I would add that you make sure to enable mlockall in the
    configuration to make sure the OS will not swap the elasticsearch
    process. I never ran ES with 100gb of memory, whats your typical
    memory usage? (node stats can give you a lot of information, also on
    the jvm level).

    On Tue, Dec 13, 2011 at 12:17 PM, Paul Loy
    keteracel@gmail.com wrote:

      That's some machine!
    
      Yeah, with stutters like this it's very useful to know
    

what's going on with your heap (and other resources). You can watch
the heap via JConsole or use some monitoring tool like traverse to
fire emails when the heap gets too big or simply be able to view
historical graphs of all your JMX exposed variables.

       To enable JMX it looks like you need jmx.create_connector:

true in your elasticsearch.yml.

       VisualVM also has an awesome Visual GC plugin that lets

you see which of the various sections of your Heap are filling up.

       Cheers,

       Paul.



       On Tue, Dec 13, 2011 at 12:54 AM, Sisu Alexandru

sisu.eugen@gmail.com wrote:

           Yes, the machine has 128 GB , 48 cores. And till now

we didn't have no problems.
Anyhow, yesterday I restarted the machine, and
everything got back to normal.
Still I'm interesting in how to configure the jmx for
es. Any tips? :slight_smile:
On Mon, Dec 12, 2011 at 11:43 PM, Shay Banon
kimchy@gmail.com wrote:

               You configured ES with 100g? Do you have a machine

with a 100gb of memory? How much memory does your machine has? I am
surprise that it even started.... (swap might be really big).

               On Mon, Dec 12, 2011 at 11:12 PM, Sisu Alexandru

sisu.eugen@gmail.com wrote:

                   Well I started up elastic search with Xms and

Xmx set to 100G.
That should've been enough.
Monitoring though JMX sounds a good ideea. but
how can you configure jmx options in es?
The only documentation that I found was here
http://www.elasticsearch.org/guide/reference/modules/jmx.html but its
not very clear to me what should I do.
Tnx,
Alex

                   On Mon, Dec 12, 2011 at 5:40 PM, Paul Loy

keteracel@gmail.com wrote:

                       Garbage Collection? How much memory are

you giving each JVM? If it's a large amount and you haven't tuned your
GC options on the JVM, this is a likely cause.

                       I don't suppose you had something

monitoring JMX over that time period. If you did you'd be able to see
if this was the issue if you notice the Heap Space Used dropping off.

                       Paul.


                       On Mon, Dec 12, 2011 at 8:00 AM, Sisu

Alexandru sisu.eugen@gmail.com wrote:

                            Hello all,
                           Our es suddenly got  blocked for 15

minutes.. That means: it suddenly stopped handling search requests
and also status requests:curl -XGET http://localhost:9200/_status.
ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors.
OS: centos 5.
Here is the gist:
https://gist.github.com/1467971
This info was obtained by jstack.
(jstack -F 27029 (pid of es)). It looks like all the threads got
blocked (all 132 )?!
The size of the jstack output is much
more bigger I've copy pasted only some parts of it.
Any ideas?
Tnx in advance,
Alex

                        --

                       Paul Loy
                       paul@keteracel.com
                       http://uk.linkedin.com/in/paulloy




       --
       ---------------------------------------------
       Paul Loy
       paul@keteracel.com
       http://uk.linkedin.com/in/paulloy

(Aurélien-2) #12

Hi.

don't bother taking the thread dumps, it's a clear OOM, TD won't be
really helpful.

I've never use jhat to do OOM analysis, but I don"t know if MAT
http://eclipse.org/mat/ will handle a hprof of 100GB. Worth a try on a
machine with a lot of memory and a well customized eclipse.

Furthermore, if ES works on Java 7, I would try to use it, and
simplify the command line by removing firsts Xms Xmx and all the XX
parameters. It will certainly not solve the OOM, but the new G1 GC
may be more efficient on large heap size.

I will let other people answer on code specific, I'm not a java dev.

Regards.

On 19 déc, 17:07, Sisu Alexandru sisu.eu...@gmail.com wrote:

Hi Aurelien

Tnx for the prompt answer.
I run es with in foreground so that I can see the eventual OOM. And indeed
it turned out to be an OOM:
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid29451.hprof ...
Exception in thread "elasticsearch[search]-pool-3-thread-18"
java.lang.OutOfMemoryError: Java heap space
at
org.elasticsearch.index.field.data.support.FieldDataLoader.load(FieldDataLoader.java:61)
at
org.elasticsearch.index.field.data.strings.StringFieldData.load(StringFieldData.java:84)
at
org.elasticsearch.index.field.data.strings.StringFieldDataType.load(StringFieldDataType.java:52)
at
org.elasticsearch.index.field.data.strings.StringFieldDataType.load(StringFieldDataType.java:34)
at org.elasticsearch.index.field.data.FieldData.load(FieldData.java:110)
at
org.elasticsearch.index.cache.field.data.support.AbstractConcurrentMapFieldDataCache.cache(AbstractConcurrentMapFieldDataCache.java:119)
at
org.elasticsearch.search.facet.terms.strings.TermsStringOrdinalsFacetCollector.doSetNextReader(TermsStringOrdinalsFacetCollector.java:127)
at
org.elasticsearch.search.facet.AbstractFacetCollector.setNextReader(AbstractFacetCollector.java:71)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:576)
at
org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:199)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:383)

Command line params:

home/jdk1.6.0_27/bin/java -Xms256m -Xmx1g -Xss128k -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError
-Delasticsearch -Des.path.home=/home/elasticsearch-0.18.5
-Des-foreground=yes -cp
:/home/elasticsearch-0.18.5/lib/:/home/elasticsearch-0.18.5/lib/sigar/
-Xms100G -Xmx100G org.elasticsearch.bootstrap.ElasticSearch

The dumps size of hprof file:

  • currently is 70GB and is increasing (slower). Probably it will reach 100
    gb.

What I did right after this was to run on my development machine the same
query on a smaller dataset.
It seems that ResidentFieldDataCache that extends the
(AbstractConcurrentMapFieldDataCache) doesnt get cleared? (I tried to put a
breakpoint on clear method , and it never gets called).
On the other hand, I also didn't configured my index with no caching
options.

I'm working now on getting the thread dumps and the verbose output of gc.


(Shay Banon) #13

It seems like you are trying to get terms on a field (body) that has many
of those (guessing by the name of it), resulting in the excessive memory
usage and OOM. The terms facet is not designed to be used on fields with
many terms.

On Mon, Dec 19, 2011 at 4:11 PM, Sisu Alexandru sisu.eugen@gmail.comwrote:

Hello all,

Following your suggestion I've tried:

  • running with bootstrap.mlockall set to True.
  • i've enabled the JMX monitoring.

The es continues to hangs. I can reproduce the problem again and again, by
executing the following query:
{ "facets": { "term_count": { "global": true, "terms": {
"field": "body", "size": 100 } } }, "size": 0}

(I'm trying to retrieve the most common 100 terms for this field).

To my surprise, I've discovered in the es folder, a set of huge files:*
java_pid_xxx.hprof*. Generated each time my es got* 'blocked'.*
Now, this files are generated when the process runs out of memory.

I'm running with 100G of heap memory allocated, and I expect that for
large memory consuming operations the memory to be swapped.

Anyhow, it seems that on my machine, each time I'm running the above
query, I'm killing it.

Other informations:

  • the jhat heap histogram can be found here:
    https://gist.github.com/1497379
  • through JConsole, each time I'm executing this query, I can see how
    the heap memory increases !very fast! it seems that 100G are consumed in a
    few seconds.
  • regarding the node stats and cluster stats es provides: I cannot access
    them after es dies, but here is the info of es, right after restart when
    everything is okey:
    os: {
    refresh_interval: 1000
    cpu: {
    vendor: AMD
    model: Opteron
    mhz: 1900
    total_cores: 48
    total_sockets: 4
    cores_per_socket: 12
    cache_size: 512b
    cache_size_in_bytes: 512
    }
    mem: {
    total: 126gb
    total_in_bytes: 135321870336
    }
    swap: {
    total: 64gb
    total_in_bytes: 68803354624
    }
    }
    process: {
    refresh_interval: 1000
    id: 15926
    max_file_descriptors: 128000
    }
    jvm: {
    pid: 15926
    version: 1.6.0_27
    vm_name: Java HotSpot(TM) 64-Bit Server VM
    vm_version: 20.2-b06
    vm_vendor: Sun Microsystems Inc.
    start_time: 1324275713418
    mem: {
    heap_init: 100gb
    heap_init_in_bytes: 107374182400
    heap_max: 99.9gb
    heap_max_in_bytes: 107302223872
    non_heap_init: 23.1mb
    non_heap_init_in_bytes: 24313856
    non_heap_max: 130mb
    non_heap_max_in_bytes: 136314880
    }
    }

Any suggestions?

Tnx,
Alex

On Tue, Dec 13, 2011 at 11:23 PM, Shay Banon kimchy@gmail.com wrote:

Also, I would add that you make sure to enable mlockall in the
configuration to make sure the OS will not swap the elasticsearch process.
I never ran ES with 100gb of memory, whats your typical memory usage? (node
stats can give you a lot of information, also on the jvm level).

On Tue, Dec 13, 2011 at 12:17 PM, Paul Loy keteracel@gmail.com wrote:

That's some machine!

Yeah, with stutters like this it's very useful to know what's going on
with your heap (and other resources). You can watch the heap via JConsole
or use some monitoring tool like traverse to fire emails when the heap gets
too big or simply be able to view historical graphs of all your JMX exposed
variables.

To enable JMX it looks like you need jmx.create_connector: true in your
elasticsearch.yml.

VisualVM also has an awesome Visual GC plugin that lets you see which of
the various sections of your Heap are filling up.

Cheers,

Paul.

On Tue, Dec 13, 2011 at 12:54 AM, Sisu Alexandru sisu.eugen@gmail.comwrote:

Yes, the machine has 128 GB , 48 cores. And till now we didn't have no
problems.
Anyhow, yesterday I restarted the machine, and everything got back to
normal.
Still I'm interesting in how to configure the jmx for es. Any tips? :slight_smile:

On Mon, Dec 12, 2011 at 11:43 PM, Shay Banon kimchy@gmail.com wrote:

You configured ES with 100g? Do you have a machine with a 100gb of
memory? How much memory does your machine has? I am surprise that it even
started... (swap might be really big).

On Mon, Dec 12, 2011 at 11:12 PM, Sisu Alexandru <sisu.eugen@gmail.com

wrote:

Well I started up elastic search with Xms and Xmx set to 100G.
That should've been enough.

Monitoring though JMX sounds a good ideea. but how can you configure
jmx options in es?
The only documentation that I found was here
http://www.elasticsearch.org/guide/reference/modules/jmx.html but
its not very clear to me what should I do.

Tnx,

Alex

On Mon, Dec 12, 2011 at 5:40 PM, Paul Loy keteracel@gmail.comwrote:

Garbage Collection? How much memory are you giving each JVM? If it's
a large amount and you haven't tuned your GC options on the JVM, this is a
likely cause.

I don't suppose you had something monitoring JMX over that time
period. If you did you'd be able to see if this was the issue if you notice
the Heap Space Used dropping off.

Paul.

On Mon, Dec 12, 2011 at 8:00 AM, Sisu Alexandru <
sisu.eugen@gmail.com> wrote:

Hello all,

Our es suddenly got blocked for 15 minutes. That means: it
suddenly stopped handling search requests and also status requests:curl
-XGET http://localhost:9200/_status.

ES version: 0.18.5
The size of the index is arround 200G.
One ES client.
20 shards. All on one single machine.
No mirrors.
OS: centos 5.

Here is the gist: https://gist.github.com/1467971
This info was obtained by jstack. (jstack -F 27029 (pid of es)). It
looks like all the threads got blocked (all 132 )?!

The size of the jstack output is much more bigger I've copy pasted
only some parts of it.

Any ideas?

Tnx in advance,

Alex

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy


(system) #14