100% processor usage


(Marcin Dojwa) #1

Hi,

I have the following problem. I have 2 nodes. I started importing data into
one of them (node A). After few hours of importing process the node B
started using 100% processor and hardly answered other search and insert
requests. It imported 11GB of data. How can I investigate such problems?
Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--


(David Pilato) #2

Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa m.dojwa@livechatinc.com a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing data into one of them (node A). After few hours of importing process the node B started using 100% processor and hardly answered other search and insert requests. It imported 11GB of data. How can I investigate such problems? Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--


(Marcin Dojwa) #3

No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr

Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa m.dojwa@livechatinc.com a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing data
into one of them (node A). After few hours of importing process the node B
started using 100% processor and hardly answered other search and insert
requests. It imported 11GB of data. How can I investigate such problems?
Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--


(Marcin Dojwa) #4

But my question is still open, how to investigate such problems ? :slight_smile:

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com

No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr

Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa m.dojwa@livechatinc.com a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing data
into one of them (node A). After few hours of importing process the node B
started using 100% processor and hardly answered other search and insert
requests. It imported 11GB of data. How can I investigate such problems?
Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--


(Rafał Kuć) #5

Hello Marcin!

Did you query ElasticSearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in ElasticSearch, to be
sure that the problems you are experiencing are not garbage collection
related.

There are some tools that can help you diagnosing what is happening
with your nodes - look at the front ends mentioned on ElasticSearch
clients page (http://www.elasticsearch.org/guide/appendix/clients.html).
Of course you can use the ones that are available on your operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

But my question is still open, how to investigate such problems ? :slight_smile:

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa m.dojwa@livechatinc.com a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing data into one of them (node A). After few hours of importing process the node B started using 100% processor and hardly answered other search and insert requests. It imported 11GB of data. How can I investigate such problems? Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--


(Marcin Dojwa) #6

Hi,

Thank you very much for your help, I will check this. Could you tell me how
to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć r.kuc@solr.pl

Hello Marcin!

Did you query ElasticSearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in ElasticSearch, to be
sure that the problems you are experiencing are not garbage collection
related.

There are some tools that can help you diagnosing what is happening
with your nodes - look at the front ends mentioned on ElasticSearch
clients page (http://www.elasticsearch.org/guide/appendix/clients.html).
Of course you can use the ones that are available on your operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

But my question is still open, how to investigate such problems ? :slight_smile:

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa m.dojwa@livechatinc.com a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing data
into one of them (node A). After few hours of importing process the node B
started using 100% processor and hardly answered other search and insert
requests. It imported 11GB of data. How can I investigate such problems?
Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

--


(Rafał Kuć) #7

Hello!

You should have the following section in your elasticsearch.yml file:

################################## GC Logging ################################

#monitor.jvm.gc.ParNew.warn: 1000ms

#monitor.jvm.gc.ParNew.info: 700ms

#monitor.jvm.gc.ParNew.debug: 400ms

#monitor.jvm.gc.ConcurrentMarkSweep.warn: 10s

#monitor.jvm.gc.ConcurrentMarkSweep.info: 5s

#monitor.jvm.gc.ConcurrentMarkSweep.debug: 2s

Just remove the comment character, restart ElasticSearch and you will have that being logged.

--

Regards,

Rafał Kuć

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

Hi,

Thank you very much for your help, I will check this. Could you tell me how to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć <r.kuc@solr.pl>

Hello Marcin!

Did you query ElasticSearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in ElasticSearch, to be

sure that the problems you are experiencing are not garbage collection

related.

There are some tools that can help you diagnosing what is happening

with your nodes - look at the front ends mentioned on ElasticSearch

clients page (http://www.elasticsearch.org/guide/appendix/clients.html).

Of course you can use the ones that are available on your operating

system like vmstat or dstat.

--

Regards,

Rafał Kuć

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

But my question is still open, how to investigate such problems ? :slight_smile:

Best regards.

2012/8/15 Marcin Dojwa <m.dojwa@livechatinc.com>

No, I will check this, thanks.

2012/8/15 David Pilato <david@pilato.fr>

Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

--

Le 15 août 2012 à 17:10, Marcin Dojwa <m.dojwa@livechatinc.com> a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing data into one of them (node A). After few hours of importing process the node B started using 100% processor and hardly answered other search and insert requests. It imported 11GB of data. How can I investigate such problems? Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.

Marcin Dojwa.

--

--

--

--

--


(Marcin Dojwa) #8

Thank you Rafał.

Best regards
Marcin Dojwa

2012/8/16 Rafał Kuć r.kuc@solr.pl

Hello!

You should have the following section in your elasticsearch.yml file:

################################## GC Logging
################################

#monitor.jvm.gc.ParNew.warn: 1000ms
#monitor.jvm.gc.ParNew.info: 700ms
#monitor.jvm.gc.ParNew.debug: 400ms

#monitor.jvm.gc.ConcurrentMarkSweep.warn: 10s
#monitor.jvm.gc.ConcurrentMarkSweep.info: 5s
#monitor.jvm.gc.ConcurrentMarkSweep.debug: 2s

Just remove the comment character, restart ElasticSearch and you will have
that being logged.

--
Regards,
Rafał Kuć
Sematext :: *
http://sematext.com/
:: Solr - Lucene - Nutch - ElasticSearch

Hi,

Thank you very much for your help, I will check this. Could you tell me
how to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć r.kuc@solr.pl
Hello Marcin!

Did you query ElasticSearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in ElasticSearch, to be
sure that the problems you are experiencing are not garbage collection
related.

There are some tools that can help you diagnosing what is happening
with your nodes - look at the front ends mentioned on ElasticSearch
clients page (http://www.elasticsearch.org/guide/appendix/clients.html).
Of course you can use the ones that are available on your operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

But my question is still open, how to investigate such problems ? :slight_smile:

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa <m.dojwa@livechatinc.com

a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing data
into one of them (node A). After few hours of importing process the node B
started using 100% processor and hardly answered other search and insert
requests. It imported 11GB of data. How can I investigate such problems?
Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

--

--

--


(Marcin Dojwa) #9

Hi,

I still need help in this :slight_smile: I have the following information:

  1. 2 nodes (es1 and es2)
  2. 1 index Users with 30 shards and 1 replica

I do the following:

  1. On es2 I run consecutively many bulk index operations (each bulk has
    1000 inserts) and I do about 1000 such bulk inserts.
  2. At some point of inserting es1 uses about 300% of processor and bulk
    insert hangs. Then I stop it.
  3. After 1 day doing nothing with any of ES nodes (I turn of everything
    that uses ES) es1 uses 100% of processor and es2 is OK.
    es1:
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

12199 es 20 0 8847m 1.8g 11m S 102 11.7 3421:26 java

es2:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

9816 es 20 0 8771m 2.2g 11m S 1 14.4 116:41.97 java
4. http://localhost:9200/_cluster/health
{

  • cluster_name: "production",
  • status: "green",
  • timed_out: false,
  • number_of_nodes: 2,
  • number_of_data_nodes: 2,
  • active_primary_shards: 30,
  • active_shards: 60,
  • relocating_shards: 0,
  • initializing_shards: 0,
  • unassigned_shards: 0

}
5. es1: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy id
wa
1 0 0 4389668 294084 8416356 0 0 0 2 0 1 1 0
99 0
1 0 0 4389420 294084 8416356 0 0 0 0 1066 2559 16 0
84 0
1 0 0 4387312 294084 8416356 0 0 0 10 1083 2584 17 0
83 0
1 0 0 4386584 294084 8416356 0 0 0 0 1087 2552 16 0
84 0
1 0 0 4385096 294084 8416356 0 0 0 6 1102 2604 18 0
82 0

  1. es2: vmstat 5 5
    procs -----------memory---------- ---swap-- -----io---- -system--
    ----cpu----
    r b swpd free buff cache si so bi bo in cs us sy id
    wa
    0 0 0 11327116 253432 1299284 0 0 1 13 16 16 0 0
    100 0
    0 0 0 11326984 253432 1299284 0 0 0 0 897 2154 0 0
    100 0
    0 0 0 11326984 253432 1299284 0 0 0 7 914 2174 0 0
    100 0
    0 0 0 11326984 253432 1299284 0 0 0 0 1057 2225 0 0
    100 0
    0 0 0 11326984 253432 1299284 0 0 0 7 874 2109 0 0
    100 0

  2. https://github.com/mobz/elasticsearch-head - this plugin shows that
    everything is OK

  3. https://github.com/lukas-vlcek/bigdesk - this plugin shows that
    everything is OK and it shows that:
    OS - CPU:
    Total: 100%
    User: 15%
    Sys: 0%

OS - LOAD:
2: 1.11
1: 1.24
0: 1.24

Process - CPU:
Total: 400%
Process: 123%

  1. After restarting elasticsearch on es1 and stabilizing es1 works fine but
    the same situation happens on es2 (100% processor usage). The only way to
    fix that is stopping all ES nodes and starting them again.

Is there anything I can do to investigate this problem?

Thanks for help.

Best regards.

2012/8/16 Marcin Dojwa m.dojwa@livechatinc.com

Thank you Rafał.

Best regards
Marcin Dojwa

2012/8/16 Rafał Kuć r.kuc@solr.pl

Hello!

You should have the following section in your elasticsearch.yml file:

################################## GC Logging
################################

#monitor.jvm.gc.ParNew.warn: 1000ms
#monitor.jvm.gc.ParNew.info: 700ms
#monitor.jvm.gc.ParNew.debug: 400ms

#monitor.jvm.gc.ConcurrentMarkSweep.warn: 10s
#monitor.jvm.gc.ConcurrentMarkSweep.info: 5s
#monitor.jvm.gc.ConcurrentMarkSweep.debug: 2s

Just remove the comment character, restart ElasticSearch and you will
have that being logged.

--
Regards,
Rafał Kuć
Sematext :: *
http://sematext.com/
:: Solr - Lucene - Nutch - ElasticSearch

Hi,

Thank you very much for your help, I will check this. Could you tell me
how to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć r.kuc@solr.pl
Hello Marcin!

Did you query ElasticSearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in ElasticSearch, to be
sure that the problems you are experiencing are not garbage collection
related.

There are some tools that can help you diagnosing what is happening
with your nodes - look at the front ends mentioned on ElasticSearch
clients page (http://www.elasticsearch.org/guide/appendix/clients.html).
Of course you can use the ones that are available on your operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
ElasticSearch

But my question is still open, how to investigate such problems ? :slight_smile:

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa <m.dojwa@livechatinc.com

a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing data
into one of them (node A). After few hours of importing process the node B
started using 100% processor and hardly answered other search and insert
requests. It imported 11GB of data. How can I investigate such problems?
Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

--

--

--


(Marcin Dojwa) #10

I am using ES 0.19.8, I want to upgrade to 0.19.9 now but I am not sure if
that will help.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Hi,

I still need help in this :slight_smile: I have the following information:

  1. 2 nodes (es1 and es2)
  2. 1 index Users with 30 shards and 1 replica

I do the following:

  1. On es2 I run consecutively many bulk index operations (each bulk has
    1000 inserts) and I do about 1000 such bulk inserts.
  2. At some point of inserting es1 uses about 300% of processor and bulk
    insert hangs. Then I stop it.
  3. After 1 day doing nothing with any of ES nodes (I turn of everything
    that uses ES) es1 uses 100% of processor and es2 is OK.
    es1:
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

12199 es 20 0 8847m 1.8g 11m S 102 11.7 3421:26 java

es2:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

9816 es 20 0 8771m 2.2g 11m S 1 14.4 116:41.97 java
4. http://localhost:9200/_cluster/health
{

  • cluster_name: "production",
  • status: "green",
  • timed_out: false,
  • number_of_nodes: 2,
  • number_of_data_nodes: 2,
  • active_primary_shards: 30,
  • active_shards: 60,
  • relocating_shards: 0,
  • initializing_shards: 0,
  • unassigned_shards: 0

}
5. es1: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy id
wa
1 0 0 4389668 294084 8416356 0 0 0 2 0 1 1 0
99 0
1 0 0 4389420 294084 8416356 0 0 0 0 1066 2559 16 0
84 0
1 0 0 4387312 294084 8416356 0 0 0 10 1083 2584 17 0
83 0
1 0 0 4386584 294084 8416356 0 0 0 0 1087 2552 16 0
84 0
1 0 0 4385096 294084 8416356 0 0 0 6 1102 2604 18 0
82 0

  1. es2: vmstat 5 5
    procs -----------memory---------- ---swap-- -----io---- -system--
    ----cpu----
    r b swpd free buff cache si so bi bo in cs us sy id
    wa
    0 0 0 11327116 253432 1299284 0 0 1 13 16 16 0 0
    100 0
    0 0 0 11326984 253432 1299284 0 0 0 0 897 2154 0 0
    100 0
    0 0 0 11326984 253432 1299284 0 0 0 7 914 2174 0 0
    100 0
    0 0 0 11326984 253432 1299284 0 0 0 0 1057 2225 0 0
    100 0
    0 0 0 11326984 253432 1299284 0 0 0 7 874 2109 0 0
    100 0

  2. https://github.com/mobz/elasticsearch-head - this plugin shows that
    everything is OK

  3. https://github.com/lukas-vlcek/bigdesk - this plugin shows that
    everything is OK and it shows that:
    OS - CPU:
    Total: 100%
    User: 15%
    Sys: 0%

OS - LOAD:
2: 1.11
1: 1.24
0: 1.24

Process - CPU:
Total: 400%
Process: 123%

  1. After restarting elasticsearch on es1 and stabilizing es1 works fine
    but the same situation happens on es2 (100% processor usage). The only way
    to fix that is stopping all ES nodes and starting them again.

Is there anything I can do to investigate this problem?

Thanks for help.

Best regards.

2012/8/16 Marcin Dojwa m.dojwa@livechatinc.com

Thank you Rafał.

Best regards
Marcin Dojwa

2012/8/16 Rafał Kuć r.kuc@solr.pl

Hello!

You should have the following section in your elasticsearch.yml file:

################################## GC Logging
################################

#monitor.jvm.gc.ParNew.warn: 1000ms
#monitor.jvm.gc.ParNew.info: 700ms
#monitor.jvm.gc.ParNew.debug: 400ms

#monitor.jvm.gc.ConcurrentMarkSweep.warn: 10s
#monitor.jvm.gc.ConcurrentMarkSweep.info: 5s
#monitor.jvm.gc.ConcurrentMarkSweep.debug: 2s

Just remove the comment character, restart ElasticSearch and you will
have that being logged.

--
Regards,
Rafał Kuć
Sematext :: *
http://sematext.com/
:: Solr - Lucene - Nutch - ElasticSearch

Hi,

Thank you very much for your help, I will check this. Could you tell me
how to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć r.kuc@solr.pl
Hello Marcin!

Did you query ElasticSearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in ElasticSearch, to be
sure that the problems you are experiencing are not garbage collection
related.

There are some tools that can help you diagnosing what is happening
with your nodes - look at the front ends mentioned on ElasticSearch
clients page (http://www.elasticsearch.org/guide/appendix/clients.html).
Of course you can use the ones that are available on your operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
ElasticSearch

But my question is still open, how to investigate such problems ? :slight_smile:

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa <m.dojwa@livechatinc.com

a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing data
into one of them (node A). After few hours of importing process the node B
started using 100% processor and hardly answered other search and insert
requests. It imported 11GB of data. How can I investigate such problems?
Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

--

--

--


(Marcin Dojwa) #11

Oh, and there is nothing in gc log.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

I am using ES 0.19.8, I want to upgrade to 0.19.9 now but I am not sure if
that will help.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Hi,

I still need help in this :slight_smile: I have the following information:

  1. 2 nodes (es1 and es2)
  2. 1 index Users with 30 shards and 1 replica

I do the following:

  1. On es2 I run consecutively many bulk index operations (each bulk has
    1000 inserts) and I do about 1000 such bulk inserts.
  2. At some point of inserting es1 uses about 300% of processor and bulk
    insert hangs. Then I stop it.
  3. After 1 day doing nothing with any of ES nodes (I turn of everything
    that uses ES) es1 uses 100% of processor and es2 is OK.
    es1:
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

12199 es 20 0 8847m 1.8g 11m S 102 11.7 3421:26 java

es2:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

9816 es 20 0 8771m 2.2g 11m S 1 14.4 116:41.97 java
4. http://localhost:9200/_cluster/health
{

  • cluster_name: "production",
  • status: "green",
  • timed_out: false,
  • number_of_nodes: 2,
  • number_of_data_nodes: 2,
  • active_primary_shards: 30,
  • active_shards: 60,
  • relocating_shards: 0,
  • initializing_shards: 0,
  • unassigned_shards: 0

}
5. es1: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy
id wa
1 0 0 4389668 294084 8416356 0 0 0 2 0 1 1 0
99 0
1 0 0 4389420 294084 8416356 0 0 0 0 1066 2559 16 0
84 0
1 0 0 4387312 294084 8416356 0 0 0 10 1083 2584 17 0
83 0
1 0 0 4386584 294084 8416356 0 0 0 0 1087 2552 16 0
84 0
1 0 0 4385096 294084 8416356 0 0 0 6 1102 2604 18 0
82 0

  1. es2: vmstat 5 5
    procs -----------memory---------- ---swap-- -----io---- -system--
    ----cpu----
    r b swpd free buff cache si so bi bo in cs us sy
    id wa
    0 0 0 11327116 253432 1299284 0 0 1 13 16 16 0
    0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 0 897 2154 0
    0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 7 914 2174 0
    0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 0 1057 2225 0
    0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 7 874 2109 0
    0 100 0

  2. https://github.com/mobz/elasticsearch-head - this plugin shows that
    everything is OK

  3. https://github.com/lukas-vlcek/bigdesk - this plugin shows that
    everything is OK and it shows that:
    OS - CPU:
    Total: 100%
    User: 15%
    Sys: 0%

OS - LOAD:
2: 1.11
1: 1.24
0: 1.24

Process - CPU:
Total: 400%
Process: 123%

  1. After restarting elasticsearch on es1 and stabilizing es1 works fine
    but the same situation happens on es2 (100% processor usage). The only way
    to fix that is stopping all ES nodes and starting them again.

Is there anything I can do to investigate this problem?

Thanks for help.

Best regards.

2012/8/16 Marcin Dojwa m.dojwa@livechatinc.com

Thank you Rafał.

Best regards
Marcin Dojwa

2012/8/16 Rafał Kuć r.kuc@solr.pl

Hello!

You should have the following section in your elasticsearch.yml file:

################################## GC Logging
################################

#monitor.jvm.gc.ParNew.warn: 1000ms
#monitor.jvm.gc.ParNew.info: 700ms
#monitor.jvm.gc.ParNew.debug: 400ms

#monitor.jvm.gc.ConcurrentMarkSweep.warn: 10s
#monitor.jvm.gc.ConcurrentMarkSweep.info: 5s
#monitor.jvm.gc.ConcurrentMarkSweep.debug: 2s

Just remove the comment character, restart ElasticSearch and you will
have that being logged.

--
Regards,
Rafał Kuć
Sematext :: *
http://sematext.com/
:: Solr - Lucene - Nutch - ElasticSearch

Hi,

Thank you very much for your help, I will check this. Could you tell me
how to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć r.kuc@solr.pl
Hello Marcin!

Did you query ElasticSearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in ElasticSearch, to be
sure that the problems you are experiencing are not garbage collection
related.

There are some tools that can help you diagnosing what is happening
with your nodes - look at the front ends mentioned on ElasticSearch
clients page (http://www.elasticsearch.org/guide/appendix/clients.html
).
Of course you can use the ones that are available on your operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
ElasticSearch

But my question is still open, how to investigate such problems ? :slight_smile:

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa <m.dojwa@livechatinc.com

a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing data
into one of them (node A). After few hours of importing process the node B
started using 100% processor and hardly answered other search and insert
requests. It imported 11GB of data. How can I investigate such problems?
Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

--

--

--


(Marcin Dojwa) #12

Fix for 1.: each bulk has 10000 inserts

Best regards.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Oh, and there is nothing in gc log.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

I am using ES 0.19.8, I want to upgrade to 0.19.9 now but I am not sure
if that will help.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Hi,

I still need help in this :slight_smile: I have the following information:

  1. 2 nodes (es1 and es2)
  2. 1 index Users with 30 shards and 1 replica

I do the following:

  1. On es2 I run consecutively many bulk index operations (each bulk has
    1000 inserts) and I do about 1000 such bulk inserts.
  2. At some point of inserting es1 uses about 300% of processor and bulk
    insert hangs. Then I stop it.
  3. After 1 day doing nothing with any of ES nodes (I turn of everything
    that uses ES) es1 uses 100% of processor and es2 is OK.
    es1:
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

12199 es 20 0 8847m 1.8g 11m S 102 11.7 3421:26 java

es2:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

9816 es 20 0 8771m 2.2g 11m S 1 14.4 116:41.97 java
4. http://localhost:9200/_cluster/health
{

  • cluster_name: "production",
  • status: "green",
  • timed_out: false,
  • number_of_nodes: 2,
  • number_of_data_nodes: 2,
  • active_primary_shards: 30,
  • active_shards: 60,
  • relocating_shards: 0,
  • initializing_shards: 0,
  • unassigned_shards: 0

}
5. es1: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy
id wa
1 0 0 4389668 294084 8416356 0 0 0 2 0 1 1
0 99 0
1 0 0 4389420 294084 8416356 0 0 0 0 1066 2559 16
0 84 0
1 0 0 4387312 294084 8416356 0 0 0 10 1083 2584 17
0 83 0
1 0 0 4386584 294084 8416356 0 0 0 0 1087 2552 16
0 84 0
1 0 0 4385096 294084 8416356 0 0 0 6 1102 2604 18
0 82 0

  1. es2: vmstat 5 5
    procs -----------memory---------- ---swap-- -----io---- -system--
    ----cpu----
    r b swpd free buff cache si so bi bo in cs us sy
    id wa
    0 0 0 11327116 253432 1299284 0 0 1 13 16 16 0
    0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 0 897 2154 0
    0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 7 914 2174 0
    0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 0 1057 2225 0
    0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 7 874 2109 0
    0 100 0

  2. https://github.com/mobz/elasticsearch-head - this plugin shows that
    everything is OK

  3. https://github.com/lukas-vlcek/bigdesk - this plugin shows that
    everything is OK and it shows that:
    OS - CPU:
    Total: 100%
    User: 15%
    Sys: 0%

OS - LOAD:
2: 1.11
1: 1.24
0: 1.24

Process - CPU:
Total: 400%
Process: 123%

  1. After restarting elasticsearch on es1 and stabilizing es1 works fine
    but the same situation happens on es2 (100% processor usage). The only way
    to fix that is stopping all ES nodes and starting them again.

Is there anything I can do to investigate this problem?

Thanks for help.

Best regards.

2012/8/16 Marcin Dojwa m.dojwa@livechatinc.com

Thank you Rafał.

Best regards
Marcin Dojwa

2012/8/16 Rafał Kuć r.kuc@solr.pl

Hello!

You should have the following section in your elasticsearch.yml file:

################################## GC Logging
################################

#monitor.jvm.gc.ParNew.warn: 1000ms
#monitor.jvm.gc.ParNew.info: 700ms
#monitor.jvm.gc.ParNew.debug: 400ms

#monitor.jvm.gc.ConcurrentMarkSweep.warn: 10s
#monitor.jvm.gc.ConcurrentMarkSweep.info: 5s
#monitor.jvm.gc.ConcurrentMarkSweep.debug: 2s

Just remove the comment character, restart ElasticSearch and you will
have that being logged.

--
Regards,
Rafał Kuć
Sematext :: *
http://sematext.com/
:: Solr - Lucene - Nutch - ElasticSearch

Hi,

Thank you very much for your help, I will check this. Could you tell
me how to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć r.kuc@solr.pl
Hello Marcin!

Did you query ElasticSearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in ElasticSearch, to be
sure that the problems you are experiencing are not garbage collection
related.

There are some tools that can help you diagnosing what is happening
with your nodes - look at the front ends mentioned on ElasticSearch
clients page (http://www.elasticsearch.org/guide/appendix/clients.html
).
Of course you can use the ones that are available on your operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
ElasticSearch

But my question is still open, how to investigate such problems ? :slight_smile:

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa <m.dojwa@livechatinc.com

a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing data
into one of them (node A). After few hours of importing process the node B
started using 100% processor and hardly answered other search and insert
requests. It imported 11GB of data. How can I investigate such problems?
Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

--

--

--


(Marcin Dojwa) #13

One one thing, I am not sure if it hangs on bulk indexing or on 'delete by
query' after bulk indexing.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Fix for 1.: each bulk has 10000 inserts

Best regards.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Oh, and there is nothing in gc log.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

I am using ES 0.19.8, I want to upgrade to 0.19.9 now but I am not sure
if that will help.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Hi,

I still need help in this :slight_smile: I have the following information:

  1. 2 nodes (es1 and es2)
  2. 1 index Users with 30 shards and 1 replica

I do the following:

  1. On es2 I run consecutively many bulk index operations (each bulk has
    1000 inserts) and I do about 1000 such bulk inserts.
  2. At some point of inserting es1 uses about 300% of processor and bulk
    insert hangs. Then I stop it.
  3. After 1 day doing nothing with any of ES nodes (I turn of everything
    that uses ES) es1 uses 100% of processor and es2 is OK.
    es1:
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

12199 es 20 0 8847m 1.8g 11m S 102 11.7 3421:26 java

es2:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

9816 es 20 0 8771m 2.2g 11m S 1 14.4 116:41.97 java

  1. http://localhost:9200/_cluster/health
    {

    • cluster_name: "production",
    • status: "green",
    • timed_out: false,
    • number_of_nodes: 2,
    • number_of_data_nodes: 2,
    • active_primary_shards: 30,
    • active_shards: 60,
    • relocating_shards: 0,
    • initializing_shards: 0,
    • unassigned_shards: 0

}
5. es1: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy
id wa
1 0 0 4389668 294084 8416356 0 0 0 2 0 1 1
0 99 0
1 0 0 4389420 294084 8416356 0 0 0 0 1066 2559 16
0 84 0
1 0 0 4387312 294084 8416356 0 0 0 10 1083 2584 17
0 83 0
1 0 0 4386584 294084 8416356 0 0 0 0 1087 2552 16
0 84 0
1 0 0 4385096 294084 8416356 0 0 0 6 1102 2604 18
0 82 0

  1. es2: vmstat 5 5
    procs -----------memory---------- ---swap-- -----io---- -system--
    ----cpu----
    r b swpd free buff cache si so bi bo in cs us sy
    id wa
    0 0 0 11327116 253432 1299284 0 0 1 13 16 16 0
    0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 0 897 2154 0
    0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 7 914 2174 0
    0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 0 1057 2225 0
    0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 7 874 2109 0
    0 100 0

  2. https://github.com/mobz/elasticsearch-head - this plugin shows that
    everything is OK

  3. https://github.com/lukas-vlcek/bigdesk - this plugin shows that
    everything is OK and it shows that:
    OS - CPU:
    Total: 100%
    User: 15%
    Sys: 0%

OS - LOAD:
2: 1.11
1: 1.24
0: 1.24

Process - CPU:
Total: 400%
Process: 123%

  1. After restarting elasticsearch on es1 and stabilizing es1 works fine
    but the same situation happens on es2 (100% processor usage). The only way
    to fix that is stopping all ES nodes and starting them again.

Is there anything I can do to investigate this problem?

Thanks for help.

Best regards.

2012/8/16 Marcin Dojwa m.dojwa@livechatinc.com

Thank you Rafał.

Best regards
Marcin Dojwa

2012/8/16 Rafał Kuć r.kuc@solr.pl

Hello!

You should have the following section in your elasticsearch.yml file:

################################## GC Logging
################################

#monitor.jvm.gc.ParNew.warn: 1000ms
#monitor.jvm.gc.ParNew.info: 700ms
#monitor.jvm.gc.ParNew.debug: 400ms

#monitor.jvm.gc.ConcurrentMarkSweep.warn: 10s
#monitor.jvm.gc.ConcurrentMarkSweep.info: 5s
#monitor.jvm.gc.ConcurrentMarkSweep.debug: 2s

Just remove the comment character, restart ElasticSearch and you will
have that being logged.

--
Regards,
Rafał Kuć
Sematext :: *
http://sematext.com/
:: Solr - Lucene - Nutch - ElasticSearch

Hi,

Thank you very much for your help, I will check this. Could you tell
me how to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć r.kuc@solr.pl
Hello Marcin!

Did you query ElasticSearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in ElasticSearch, to be
sure that the problems you are experiencing are not garbage collection
related.

There are some tools that can help you diagnosing what is happening
with your nodes - look at the front ends mentioned on ElasticSearch
clients page (
http://www.elasticsearch.org/guide/appendix/clients.html).
Of course you can use the ones that are available on your operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
ElasticSearch

But my question is still open, how to investigate such problems ? :slight_smile:

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa <m.dojwa@livechatinc.com

a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing
data into one of them (node A). After few hours of importing process the
node B started using 100% processor and hardly answered other search and
insert requests. It imported 11GB of data. How can I investigate such
problems? Are there any logs that can help with such situation? What should
I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

--

--

--


(Shay Banon) #14

Can you issue: curl localhost:9200/_nodes/hot_threads and gist the response?

On Sep 27, 2012, at 1:19 PM, Marcin Dojwa m.dojwa@livechatinc.com wrote:

One one thing, I am not sure if it hangs on bulk indexing or on 'delete by query' after bulk indexing.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com
Fix for 1.: each bulk has 10000 inserts

Best regards.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com
Oh, and there is nothing in gc log.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com
I am using ES 0.19.8, I want to upgrade to 0.19.9 now but I am not sure if that will help.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com
Hi,

I still need help in this :slight_smile: I have the following information:

  1. 2 nodes (es1 and es2)
  2. 1 index Users with 30 shards and 1 replica

I do the following:

  1. On es2 I run consecutively many bulk index operations (each bulk has 1000 inserts) and I do about 1000 such bulk inserts.
  2. At some point of inserting es1 uses about 300% of processor and bulk insert hangs. Then I stop it.
  3. After 1 day doing nothing with any of ES nodes (I turn of everything that uses ES) es1 uses 100% of processor and es2 is OK.
    es1:
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    12199 es 20 0 8847m 1.8g 11m S 102 11.7 3421:26 java
    es2:
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    9816 es 20 0 8771m 2.2g 11m S 1 14.4 116:41.97 java
  4. http://localhost:9200/_cluster/health
    {

cluster_name: "production",

status: "green",

timed_out: false,

number_of_nodes: 2,

number_of_data_nodes: 2,

active_primary_shards: 30,

active_shards: 60,

relocating_shards: 0,

initializing_shards: 0,

unassigned_shards: 0

}
5. es1: vmstat 5 5

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 0 4389668 294084 8416356 0 0 0 2 0 1 1 0 99 0

1 0 0 4389420 294084 8416356 0 0 0 0 1066 2559 16 0 84 0
1 0 0 4387312 294084 8416356 0 0 0 10 1083 2584 17 0 83 0
1 0 0 4386584 294084 8416356 0 0 0 0 1087 2552 16 0 84 0

1 0 0 4385096 294084 8416356 0 0 0 6 1102 2604 18 0 82 0

  1. es2: vmstat 5 5
    procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
    r b swpd free buff cache si so bi bo in cs us sy id wa

0 0 0 11327116 253432 1299284 0 0 1 13 16 16 0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 0 897 2154 0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 7 914 2174 0 0 100 0

0 0 0 11326984 253432 1299284 0 0 0 0 1057 2225 0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 7 874 2109 0 0 100 0

  1. https://github.com/mobz/elasticsearch-head - this plugin shows that everything is OK

  2. https://github.com/lukas-vlcek/bigdesk - this plugin shows that everything is OK and it shows that:
    OS - CPU:
    Total: 100%
    User: 15%
    Sys: 0%

OS - LOAD:
2: 1.11
1: 1.24
0: 1.24

Process - CPU:
Total: 400%
Process: 123%

  1. After restarting elasticsearch on es1 and stabilizing es1 works fine but the same situation happens on es2 (100% processor usage). The only way to fix that is stopping all ES nodes and starting them again.

Is there anything I can do to investigate this problem?

Thanks for help.

Best regards.

2012/8/16 Marcin Dojwa m.dojwa@livechatinc.com
Thank you Rafał.

Best regards
Marcin Dojwa

2012/8/16 Rafał Kuć r.kuc@solr.pl
Hello!

You should have the following section in your elasticsearch.yml file:

################################## GC Logging ################################

#monitor.jvm.gc.ParNew.warn: 1000ms
#monitor.jvm.gc.ParNew.info: 700ms
#monitor.jvm.gc.ParNew.debug: 400ms

#monitor.jvm.gc.ConcurrentMarkSweep.warn: 10s
#monitor.jvm.gc.ConcurrentMarkSweep.info: 5s
#monitor.jvm.gc.ConcurrentMarkSweep.debug: 2s

Just remove the comment character, restart ElasticSearch and you will have that being logged.

--
Regards,
Rafał Kuć
Sematext ::
http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

Hi,

Thank you very much for your help, I will check this. Could you tell me how to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć r.kuc@solr.pl
Hello Marcin!

Did you query ElasticSearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in ElasticSearch, to be
sure that the problems you are experiencing are not garbage collection
related.

There are some tools that can help you diagnosing what is happening
with your nodes - look at the front ends mentioned on ElasticSearch
clients page (http://www.elasticsearch.org/guide/appendix/clients.html).
Of course you can use the ones that are available on your operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

But my question is still open, how to investigate such problems ? :slight_smile:

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa <m.dojwa@livechatinc.com

a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing data into one of them (node A). After few hours of importing process the node B started using 100% processor and hardly answered other search and insert requests. It imported 11GB of data. How can I investigate such problems? Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

--

--

--

--


(Marcin Dojwa) #15

Sure, I will try to reproduce this case on Monday. I updated ES to 0.19.9
so I will check if this happens with this version too.

Best regards.

2012/9/28 Shay Banon kimchy@gmail.com

Can you issue: curl localhost:9200/_nodes/hot_threads and gist the
response?

On Sep 27, 2012, at 1:19 PM, Marcin Dojwa m.dojwa@livechatinc.com wrote:

One one thing, I am not sure if it hangs on bulk indexing or on 'delete by
query' after bulk indexing.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Fix for 1.: each bulk has 10000 inserts

Best regards.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Oh, and there is nothing in gc log.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

I am using ES 0.19.8, I want to upgrade to 0.19.9 now but I am not sure
if that will help.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Hi,

I still need help in this :slight_smile: I have the following information:

  1. 2 nodes (es1 and es2)
  2. 1 index Users with 30 shards and 1 replica

I do the following:

  1. On es2 I run consecutively many bulk index operations (each bulk
    has 1000 inserts) and I do about 1000 such bulk inserts.
  2. At some point of inserting es1 uses about 300% of processor and
    bulk insert hangs. Then I stop it.
  3. After 1 day doing nothing with any of ES nodes (I turn of
    everything that uses ES) es1 uses 100% of processor and es2 is OK.
    es1:
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

12199 es 20 0 8847m 1.8g 11m S 102 11.7 3421:26 java

es2:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

9816 es 20 0 8771m 2.2g 11m S 1 14.4 116:41.97 java

  1. http://localhost:9200/_cluster/health
    {

    • cluster_name: "production",
    • status: "green",
    • timed_out: false,
    • number_of_nodes: 2,
    • number_of_data_nodes: 2,
    • active_primary_shards: 30,
    • active_shards: 60,
    • relocating_shards: 0,
    • initializing_shards: 0,
    • unassigned_shards: 0

}
5. es1: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us
sy id wa
1 0 0 4389668 294084 8416356 0 0 0 2 0 1 1
0 99 0
1 0 0 4389420 294084 8416356 0 0 0 0 1066 2559 16
0 84 0
1 0 0 4387312 294084 8416356 0 0 0 10 1083 2584 17
0 83 0
1 0 0 4386584 294084 8416356 0 0 0 0 1087 2552 16
0 84 0
1 0 0 4385096 294084 8416356 0 0 0 6 1102 2604 18
0 82 0

  1. es2: vmstat 5 5
    procs -----------memory---------- ---swap-- -----io---- -system--
    ----cpu----
    r b swpd free buff cache si so bi bo in cs us
    sy id wa
    0 0 0 11327116 253432 1299284 0 0 1 13 16 16
    0 0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 0 897 2154
    0 0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 7 914 2174
    0 0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 0 1057 2225
    0 0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 7 874 2109
    0 0 100 0

  2. https://github.com/mobz/elasticsearch-head - this plugin shows
    that everything is OK

  3. https://github.com/lukas-vlcek/bigdesk - this plugin shows that
    everything is OK and it shows that:
    OS - CPU:
    Total: 100%
    User: 15%
    Sys: 0%

OS - LOAD:
2: 1.11
1: 1.24
0: 1.24

Process - CPU:
Total: 400%
Process: 123%

  1. After restarting elasticsearch on es1 and stabilizing es1 works
    fine but the same situation happens on es2 (100% processor usage). The only
    way to fix that is stopping all ES nodes and starting them again.

Is there anything I can do to investigate this problem?

Thanks for help.

Best regards.

2012/8/16 Marcin Dojwa m.dojwa@livechatinc.com

Thank you Rafał.

Best regards
Marcin Dojwa

2012/8/16 Rafał Kuć r.kuc@solr.pl

Hello!

You should have the following section in your elasticsearch.yml file:

################################## GC Logging
################################

#monitor.jvm.gc.ParNew.warn: 1000ms
#monitor.jvm.gc.ParNew.info http://monitor.jvm.gc.parnew.info/:
700ms
#monitor.jvm.gc.ParNew.debug: 400ms

#monitor.jvm.gc.ConcurrentMarkSweep.warn: 10s
#monitor.jvm.gc.ConcurrentMarkSweep.infohttp://monitor.jvm.gc.concurrentmarksweep.info/:
5s
#monitor.jvm.gc.ConcurrentMarkSweep.debug: 2s

Just remove the comment character, restart ElasticSearch and you
will have that being logged.

--
Regards,
Rafał Kuć
Sematext :: *
http://sematext.com/
:: Solr - Lucene - Nutch - ElasticSearch

Hi,

Thank you very much for your help, I will check this. Could you tell
me how to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć r.kuc@solr.pl
Hello Marcin!

Did you query ElasticSearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in ElasticSearch, to
be
sure that the problems you are experiencing are not garbage
collection
related.

There are some tools that can help you diagnosing what is happening
with your nodes - look at the front ends mentioned on ElasticSearch
clients page (
http://www.elasticsearch.org/guide/appendix/clients.html).
Of course you can use the ones that are available on your operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
ElasticSearch

But my question is still open, how to investigate such problems ? :slight_smile:

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa <m.dojwa@livechatinc.com

a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing
data into one of them (node A). After few hours of importing process the
node B started using 100% processor and hardly answered other search and
insert requests. It imported 11GB of data. How can I investigate such
problems? Are there any logs that can help with such situation? What should
I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

--

--

--

--

--


(Marcin Dojwa) #16

Hi,

I attached 4 different (I think) snapshots of curl
localhost:9200/_nodes/hot_threads here:

Best regards.
Marcin Dojwa

2012/9/29 Marcin Dojwa m.dojwa@livechatinc.com

Sure, I will try to reproduce this case on Monday. I updated ES to 0.19.9
so I will check if this happens with this version too.

Best regards.

2012/9/28 Shay Banon kimchy@gmail.com

Can you issue: curl localhost:9200/_nodes/hot_threads and gist the
response?

On Sep 27, 2012, at 1:19 PM, Marcin Dojwa m.dojwa@livechatinc.com
wrote:

One one thing, I am not sure if it hangs on bulk indexing or on 'delete
by query' after bulk indexing.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Fix for 1.: each bulk has 10000 inserts

Best regards.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Oh, and there is nothing in gc log.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

I am using ES 0.19.8, I want to upgrade to 0.19.9 now but I am not
sure if that will help.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Hi,

I still need help in this :slight_smile: I have the following information:

  1. 2 nodes (es1 and es2)
  2. 1 index Users with 30 shards and 1 replica

I do the following:

  1. On es2 I run consecutively many bulk index operations (each bulk
    has 1000 inserts) and I do about 1000 such bulk inserts.
  2. At some point of inserting es1 uses about 300% of processor and
    bulk insert hangs. Then I stop it.
  3. After 1 day doing nothing with any of ES nodes (I turn of
    everything that uses ES) es1 uses 100% of processor and es2 is OK.
    es1:
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

12199 es 20 0 8847m 1.8g 11m S 102 11.7 3421:26 java

es2:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

9816 es 20 0 8771m 2.2g 11m S 1 14.4 116:41.97 java

  1. http://localhost:9200/_cluster/health
    {

    • cluster_name: "production",
    • status: "green",
    • timed_out: false,
    • number_of_nodes: 2,
    • number_of_data_nodes: 2,
    • active_primary_shards: 30,
    • active_shards: 60,
    • relocating_shards: 0,
    • initializing_shards: 0,
    • unassigned_shards: 0

}
5. es1: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us
sy id wa
1 0 0 4389668 294084 8416356 0 0 0 2 0 1
1 0 99 0
1 0 0 4389420 294084 8416356 0 0 0 0 1066 2559
16 0 84 0
1 0 0 4387312 294084 8416356 0 0 0 10 1083 2584
17 0 83 0
1 0 0 4386584 294084 8416356 0 0 0 0 1087 2552
16 0 84 0
1 0 0 4385096 294084 8416356 0 0 0 6 1102 2604
18 0 82 0

  1. es2: vmstat 5 5
    procs -----------memory---------- ---swap-- -----io---- -system--
    ----cpu----
    r b swpd free buff cache si so bi bo in cs us
    sy id wa
    0 0 0 11327116 253432 1299284 0 0 1 13 16 16
    0 0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 0 897 2154
    0 0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 7 914 2174
    0 0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 0 1057 2225
    0 0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 7 874 2109
    0 0 100 0

  2. https://github.com/mobz/elasticsearch-head - this plugin shows
    that everything is OK

  3. https://github.com/lukas-vlcek/bigdesk - this plugin shows that
    everything is OK and it shows that:
    OS - CPU:
    Total: 100%
    User: 15%
    Sys: 0%

OS - LOAD:
2: 1.11
1: 1.24
0: 1.24

Process - CPU:
Total: 400%
Process: 123%

  1. After restarting elasticsearch on es1 and stabilizing es1 works
    fine but the same situation happens on es2 (100% processor usage). The only
    way to fix that is stopping all ES nodes and starting them again.

Is there anything I can do to investigate this problem?

Thanks for help.

Best regards.

2012/8/16 Marcin Dojwa m.dojwa@livechatinc.com

Thank you Rafał.

Best regards
Marcin Dojwa

2012/8/16 Rafał Kuć r.kuc@solr.pl

Hello!

You should have the following section in your elasticsearch.yml
file:

################################## GC Logging
################################

#monitor.jvm.gc.ParNew.warn: 1000ms
#monitor.jvm.gc.ParNew.info http://monitor.jvm.gc.parnew.info/:
700ms
#monitor.jvm.gc.ParNew.debug: 400ms

#monitor.jvm.gc.ConcurrentMarkSweep.warn: 10s
#monitor.jvm.gc.ConcurrentMarkSweep.infohttp://monitor.jvm.gc.concurrentmarksweep.info/:
5s
#monitor.jvm.gc.ConcurrentMarkSweep.debug: 2s

Just remove the comment character, restart ElasticSearch and you
will have that being logged.

--
Regards,
Rafał Kuć
Sematext :: *
http://sematext.com/
:: Solr - Lucene - Nutch - ElasticSearch

Hi,

Thank you very much for your help, I will check this. Could you
tell me how to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć r.kuc@solr.pl
Hello Marcin!

Did you query ElasticSearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in ElasticSearch, to
be
sure that the problems you are experiencing are not garbage
collection
related.

There are some tools that can help you diagnosing what is happening
with your nodes - look at the front ends mentioned on ElasticSearch
clients page (
http://www.elasticsearch.org/guide/appendix/clients.html).
Of course you can use the ones that are available on your operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
ElasticSearch

But my question is still open, how to investigate such problems ? :slight_smile:

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa <m.dojwa@livechatinc.com

a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing
data into one of them (node A). After few hours of importing process the
node B started using 100% processor and hardly answered other search and
insert requests. It imported 11GB of data. How can I investigate such
problems? Are there any logs that can help with such situation? What should
I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

--

--

--

--

--


(Shay Banon) #17

It seems like its doing the delete by query with nested documents. Can you share your mapping, and sample data that you index, and the delete_by_query that you are using?

On Oct 2, 2012, at 4:16 AM, Marcin Dojwa m.dojwa@livechatinc.com wrote:

Hi,

I attached 4 different (I think) snapshots of curl localhost:9200/_nodes/hot_threads here: https://gist.github.com/f2826104eb5b288d3fb0

Best regards.
Marcin Dojwa

2012/9/29 Marcin Dojwa m.dojwa@livechatinc.com
Sure, I will try to reproduce this case on Monday. I updated ES to 0.19.9 so I will check if this happens with this version too.

Best regards.

2012/9/28 Shay Banon kimchy@gmail.com
Can you issue: curl localhost:9200/_nodes/hot_threads and gist the response?

On Sep 27, 2012, at 1:19 PM, Marcin Dojwa m.dojwa@livechatinc.com wrote:

One one thing, I am not sure if it hangs on bulk indexing or on 'delete by query' after bulk indexing.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com
Fix for 1.: each bulk has 10000 inserts

Best regards.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com
Oh, and there is nothing in gc log.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com
I am using ES 0.19.8, I want to upgrade to 0.19.9 now but I am not sure if that will help.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com
Hi,

I still need help in this :slight_smile: I have the following information:

  1. 2 nodes (es1 and es2)
  2. 1 index Users with 30 shards and 1 replica

I do the following:

  1. On es2 I run consecutively many bulk index operations (each bulk has 1000 inserts) and I do about 1000 such bulk inserts.
  2. At some point of inserting es1 uses about 300% of processor and bulk insert hangs. Then I stop it.
  3. After 1 day doing nothing with any of ES nodes (I turn of everything that uses ES) es1 uses 100% of processor and es2 is OK.
    es1:
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    12199 es 20 0 8847m 1.8g 11m S 102 11.7 3421:26 java
    es2:
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    9816 es 20 0 8771m 2.2g 11m S 1 14.4 116:41.97 java
  4. http://localhost:9200/_cluster/health
    {

cluster_name: "production",

status: "green",

timed_out: false,

number_of_nodes: 2,

number_of_data_nodes: 2,

active_primary_shards: 30,

active_shards: 60,

relocating_shards: 0,

initializing_shards: 0,

unassigned_shards: 0

}
5. es1: vmstat 5 5

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 0 4389668 294084 8416356 0 0 0 2 0 1 1 0 99 0

1 0 0 4389420 294084 8416356 0 0 0 0 1066 2559 16 0 84 0
1 0 0 4387312 294084 8416356 0 0 0 10 1083 2584 17 0 83 0
1 0 0 4386584 294084 8416356 0 0 0 0 1087 2552 16 0 84 0

1 0 0 4385096 294084 8416356 0 0 0 6 1102 2604 18 0 82 0

  1. es2: vmstat 5 5
    procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
    r b swpd free buff cache si so bi bo in cs us sy id wa

0 0 0 11327116 253432 1299284 0 0 1 13 16 16 0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 0 897 2154 0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 7 914 2174 0 0 100 0

0 0 0 11326984 253432 1299284 0 0 0 0 1057 2225 0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 7 874 2109 0 0 100 0

  1. https://github.com/mobz/elasticsearch-head - this plugin shows that everything is OK

  2. https://github.com/lukas-vlcek/bigdesk - this plugin shows that everything is OK and it shows that:
    OS - CPU:
    Total: 100%
    User: 15%
    Sys: 0%

OS - LOAD:
2: 1.11
1: 1.24
0: 1.24

Process - CPU:
Total: 400%
Process: 123%

  1. After restarting elasticsearch on es1 and stabilizing es1 works fine but the same situation happens on es2 (100% processor usage). The only way to fix that is stopping all ES nodes and starting them again.

Is there anything I can do to investigate this problem?

Thanks for help.

Best regards.

2012/8/16 Marcin Dojwa m.dojwa@livechatinc.com
Thank you Rafał.

Best regards
Marcin Dojwa

2012/8/16 Rafał Kuć r.kuc@solr.pl
Hello!

You should have the following section in your elasticsearch.yml file:

################################## GC Logging ################################

#monitor.jvm.gc.ParNew.warn: 1000ms
#monitor.jvm.gc.ParNew.info: 700ms
#monitor.jvm.gc.ParNew.debug: 400ms

#monitor.jvm.gc.ConcurrentMarkSweep.warn: 10s
#monitor.jvm.gc.ConcurrentMarkSweep.info: 5s
#monitor.jvm.gc.ConcurrentMarkSweep.debug: 2s

Just remove the comment character, restart ElasticSearch and you will have that being logged.

--
Regards,
Rafał Kuć
Sematext ::
http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

Hi,

Thank you very much for your help, I will check this. Could you tell me how to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć r.kuc@solr.pl
Hello Marcin!

Did you query ElasticSearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in ElasticSearch, to be
sure that the problems you are experiencing are not garbage collection
related.

There are some tools that can help you diagnosing what is happening
with your nodes - look at the front ends mentioned on ElasticSearch
clients page (http://www.elasticsearch.org/guide/appendix/clients.html).
Of course you can use the ones that are available on your operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

But my question is still open, how to investigate such problems ? :slight_smile:

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa <m.dojwa@livechatinc.com

a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing data into one of them (node A). After few hours of importing process the node B started using 100% processor and hardly answered other search and insert requests. It imported 11GB of data. How can I investigate such problems? Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

--

--

--

--

--

--


(Marcin Dojwa) #18

Hi,

The exact mapping is here: https://gist.github.com/bbde44039b20a57eebf0
I guess that this is about chat document type because this is the only one
with nested mapping. I can not give you exact data that could possibly
cause such a situation because this is our clients' private data. Are you
able to try reproduce the problem with this information about mapping only?
If not I will try to prepare fake example data.

Best regards.

2012/10/2 Shay Banon kimchy@gmail.com

It seems like its doing the delete by query with nested documents. Can you
share your mapping, and sample data that you index, and the delete_by_query
that you are using?

On Oct 2, 2012, at 4:16 AM, Marcin Dojwa m.dojwa@livechatinc.com wrote:

Hi,

I attached 4 different (I think) snapshots of curl
localhost:9200/_nodes/hot_threads here:
https://gist.github.com/f2826104eb5b288d3fb0

Best regards.
Marcin Dojwa

2012/9/29 Marcin Dojwa m.dojwa@livechatinc.com

Sure, I will try to reproduce this case on Monday. I updated ES to 0.19.9
so I will check if this happens with this version too.

Best regards.

2012/9/28 Shay Banon kimchy@gmail.com

Can you issue: curl localhost:9200/_nodes/hot_threads and gist the
response?

On Sep 27, 2012, at 1:19 PM, Marcin Dojwa m.dojwa@livechatinc.com
wrote:

One one thing, I am not sure if it hangs on bulk indexing or on 'delete
by query' after bulk indexing.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Fix for 1.: each bulk has 10000 inserts

Best regards.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Oh, and there is nothing in gc log.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

I am using ES 0.19.8, I want to upgrade to 0.19.9 now but I am not
sure if that will help.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Hi,

I still need help in this :slight_smile: I have the following information:

  1. 2 nodes (es1 and es2)
  2. 1 index Users with 30 shards and 1 replica

I do the following:

  1. On es2 I run consecutively many bulk index operations (each bulk
    has 1000 inserts) and I do about 1000 such bulk inserts.
  2. At some point of inserting es1 uses about 300% of processor and
    bulk insert hangs. Then I stop it.
  3. After 1 day doing nothing with any of ES nodes (I turn of
    everything that uses ES) es1 uses 100% of processor and es2 is OK.
    es1:
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

12199 es 20 0 8847m 1.8g 11m S 102 11.7 3421:26 java

es2:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

9816 es 20 0 8771m 2.2g 11m S 1 14.4 116:41.97 java

  1. http://localhost:9200/_cluster/health
    {

    • cluster_name: "production",
    • status: "green",
    • timed_out: false,
    • number_of_nodes: 2,
    • number_of_data_nodes: 2,
    • active_primary_shards: 30,
    • active_shards: 60,
    • relocating_shards: 0,
    • initializing_shards: 0,
    • unassigned_shards: 0

}
5. es1: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us
sy id wa
1 0 0 4389668 294084 8416356 0 0 0 2 0 1
1 0 99 0
1 0 0 4389420 294084 8416356 0 0 0 0 1066 2559
16 0 84 0
1 0 0 4387312 294084 8416356 0 0 0 10 1083 2584
17 0 83 0
1 0 0 4386584 294084 8416356 0 0 0 0 1087 2552
16 0 84 0
1 0 0 4385096 294084 8416356 0 0 0 6 1102 2604
18 0 82 0

  1. es2: vmstat 5 5
    procs -----------memory---------- ---swap-- -----io---- -system--
    ----cpu----
    r b swpd free buff cache si so bi bo in cs us
    sy id wa
    0 0 0 11327116 253432 1299284 0 0 1 13 16 16
    0 0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 0 897 2154
    0 0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 7 914 2174
    0 0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 0 1057 2225
    0 0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 7 874 2109
    0 0 100 0

  2. https://github.com/mobz/elasticsearch-head - this plugin shows
    that everything is OK

  3. https://github.com/lukas-vlcek/bigdesk - this plugin shows that
    everything is OK and it shows that:
    OS - CPU:
    Total: 100%
    User: 15%
    Sys: 0%

OS - LOAD:
2: 1.11
1: 1.24
0: 1.24

Process - CPU:
Total: 400%
Process: 123%

  1. After restarting elasticsearch on es1 and stabilizing es1 works
    fine but the same situation happens on es2 (100% processor usage). The only
    way to fix that is stopping all ES nodes and starting them again.

Is there anything I can do to investigate this problem?

Thanks for help.

Best regards.

2012/8/16 Marcin Dojwa m.dojwa@livechatinc.com

Thank you Rafał.

Best regards
Marcin Dojwa

2012/8/16 Rafał Kuć r.kuc@solr.pl

Hello!

You should have the following section in your elasticsearch.yml
file:

################################## GC Logging
################################

#monitor.jvm.gc.ParNew.warn: 1000ms
#monitor.jvm.gc.ParNew.info http://monitor.jvm.gc.parnew.info/:
700ms
#monitor.jvm.gc.ParNew.debug: 400ms

#monitor.jvm.gc.ConcurrentMarkSweep.warn: 10s
#monitor.jvm.gc.ConcurrentMarkSweep.infohttp://monitor.jvm.gc.concurrentmarksweep.info/:
5s
#monitor.jvm.gc.ConcurrentMarkSweep.debug: 2s

Just remove the comment character, restart ElasticSearch and you
will have that being logged.

--
Regards,
Rafał Kuć
Sematext :: *
http://sematext.com/
:: Solr - Lucene - Nutch - ElasticSearch

Hi,

Thank you very much for your help, I will check this. Could you
tell me how to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć r.kuc@solr.pl
Hello Marcin!

Did you query ElasticSearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in ElasticSearch,
to be
sure that the problems you are experiencing are not garbage
collection
related.

There are some tools that can help you diagnosing what is happening
with your nodes - look at the front ends mentioned on ElasticSearch
clients page (
http://www.elasticsearch.org/guide/appendix/clients.html).
Of course you can use the ones that are available on your operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
ElasticSearch

But my question is still open, how to investigate such problems ?
:slight_smile:

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa <m.dojwa@livechatinc.com

a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing
data into one of them (node A). After few hours of importing process the
node B started using 100% processor and hardly answered other search and
insert requests. It imported 11GB of data. How can I investigate such
problems? Are there any logs that can help with such situation? What should
I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

--

--

--

--

--

--

--


(Martijn Van Groningen) #19

Hi Marcin,

Are you using one or more aliases when executing the delete by query?

Martijn

On 3 October 2012 10:50, Marcin Dojwa m.dojwa@livechatinc.com wrote:

Hi,

The exact mapping is here: https://gist.github.com/bbde44039b20a57eebf0
I guess that this is about chat document type because this is the only one
with nested mapping. I can not give you exact data that could possibly
cause such a situation because this is our clients' private data. Are you
able to try reproduce the problem with this information about mapping only?
If not I will try to prepare fake example data.

Best regards.

2012/10/2 Shay Banon kimchy@gmail.com

It seems like its doing the delete by query with nested documents. Can
you share your mapping, and sample data that you index, and the
delete_by_query that you are using?

On Oct 2, 2012, at 4:16 AM, Marcin Dojwa m.dojwa@livechatinc.com wrote:

Hi,

I attached 4 different (I think) snapshots of curl
localhost:9200/_nodes/hot_threads here:
https://gist.github.com/f2826104eb5b288d3fb0

Best regards.
Marcin Dojwa

2012/9/29 Marcin Dojwa m.dojwa@livechatinc.com

Sure, I will try to reproduce this case on Monday. I updated ES to
0.19.9 so I will check if this happens with this version too.

Best regards.

2012/9/28 Shay Banon kimchy@gmail.com

Can you issue: curl localhost:9200/_nodes/hot_threads and gist the
response?

On Sep 27, 2012, at 1:19 PM, Marcin Dojwa m.dojwa@livechatinc.com
wrote:

One one thing, I am not sure if it hangs on bulk indexing or on 'delete
by query' after bulk indexing.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Fix for 1.: each bulk has 10000 inserts

Best regards.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Oh, and there is nothing in gc log.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

I am using ES 0.19.8, I want to upgrade to 0.19.9 now but I am not
sure if that will help.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Hi,

I still need help in this :slight_smile: I have the following information:

  1. 2 nodes (es1 and es2)
  2. 1 index Users with 30 shards and 1 replica

I do the following:

  1. On es2 I run consecutively many bulk index operations (each bulk
    has 1000 inserts) and I do about 1000 such bulk inserts.
  2. At some point of inserting es1 uses about 300% of processor and
    bulk insert hangs. Then I stop it.
  3. After 1 day doing nothing with any of ES nodes (I turn of
    everything that uses ES) es1 uses 100% of processor and es2 is OK.
    es1:
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

12199 es 20 0 8847m 1.8g 11m S 102 11.7 3421:26 java

es2:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND

9816 es 20 0 8771m 2.2g 11m S 1 14.4 116:41.97 java

  1. http://localhost:9200/_cluster/health
    {

    • cluster_name: "production",
    • status: "green",
    • timed_out: false,
    • number_of_nodes: 2,
    • number_of_data_nodes: 2,
    • active_primary_shards: 30,
    • active_shards: 60,
    • relocating_shards: 0,
    • initializing_shards: 0,
    • unassigned_shards: 0

}
5. es1: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs
us sy id wa
1 0 0 4389668 294084 8416356 0 0 0 2 0 1
1 0 99 0
1 0 0 4389420 294084 8416356 0 0 0 0 1066 2559
16 0 84 0
1 0 0 4387312 294084 8416356 0 0 0 10 1083 2584
17 0 83 0
1 0 0 4386584 294084 8416356 0 0 0 0 1087 2552
16 0 84 0
1 0 0 4385096 294084 8416356 0 0 0 6 1102 2604
18 0 82 0

  1. es2: vmstat 5 5
    procs -----------memory---------- ---swap-- -----io---- -system--
    ----cpu----
    r b swpd free buff cache si so bi bo in cs
    us sy id wa
    0 0 0 11327116 253432 1299284 0 0 1 13 16
    16 0 0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 0 897
    2154 0 0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 7 914
    2174 0 0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 0 1057
    2225 0 0 100 0
    0 0 0 11326984 253432 1299284 0 0 0 7 874
    2109 0 0 100 0

  2. https://github.com/mobz/elasticsearch-head - this plugin shows
    that everything is OK

  3. https://github.com/lukas-vlcek/bigdesk - this plugin shows that
    everything is OK and it shows that:
    OS - CPU:
    Total: 100%
    User: 15%
    Sys: 0%

OS - LOAD:
2: 1.11
1: 1.24
0: 1.24

Process - CPU:
Total: 400%
Process: 123%

  1. After restarting elasticsearch on es1 and stabilizing es1 works
    fine but the same situation happens on es2 (100% processor usage). The only
    way to fix that is stopping all ES nodes and starting them again.

Is there anything I can do to investigate this problem?

Thanks for help.

Best regards.

2012/8/16 Marcin Dojwa m.dojwa@livechatinc.com

Thank you Rafał.

Best regards
Marcin Dojwa

2012/8/16 Rafał Kuć r.kuc@solr.pl

Hello!

You should have the following section in your elasticsearch.yml
file:

################################## GC Logging
################################

#monitor.jvm.gc.ParNew.warn: 1000ms
#monitor.jvm.gc.ParNew.info http://monitor.jvm.gc.parnew.info/:
700ms
#monitor.jvm.gc.ParNew.debug: 400ms

#monitor.jvm.gc.ConcurrentMarkSweep.warn: 10s
#monitor.jvm.gc.ConcurrentMarkSweep.infohttp://monitor.jvm.gc.concurrentmarksweep.info/:
5s
#monitor.jvm.gc.ConcurrentMarkSweep.debug: 2s

Just remove the comment character, restart ElasticSearch and you
will have that being logged.

--
Regards,
Rafał Kuć
Sematext :: *
http://sematext.com/
:: Solr - Lucene - Nutch - ElasticSearch

Hi,

Thank you very much for your help, I will check this. Could you
tell me how to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć r.kuc@solr.pl
Hello Marcin!

Did you query ElasticSearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in ElasticSearch,
to be
sure that the problems you are experiencing are not garbage
collection
related.

There are some tools that can help you diagnosing what is
happening
with your nodes - look at the front ends mentioned on
ElasticSearch
clients page (
http://www.elasticsearch.org/guide/appendix/clients.html).
Of course you can use the ones that are available on your
operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
ElasticSearch

But my question is still open, how to investigate such problems ?
:slight_smile:

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa <m.dojwa@livechatinc.com

a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing
data into one of them (node A). After few hours of importing process the
node B started using 100% processor and hardly answered other search and
insert requests. It imported 11GB of data. How can I investigate such
problems? Are there any logs that can help with such situation? What should
I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

--

--

--

--

--

--

--

--
Met vriendelijke groet,

Martijn van Groningen

--


(Marcin Dojwa) #20

More, I have 1 index and about 50000 aliases to this index.

W dniu środa, 3 października 2012 użytkownik Martijn v Groningen napisał:

Hi Marcin,

Are you using one or more aliases when executing the delete by query?

Martijn

On 3 October 2012 10:50, Marcin Dojwa <m.dojwa@livechatinc.com<javascript:_e({}, 'cvml', 'm.dojwa@livechatinc.com');>

wrote:

Hi,

The exact mapping is here: https://gist.github.com/bbde44039b20a57eebf0
I guess that this is about chat document type because this is the only
one with nested mapping. I can not give you exact data that could possibly
cause such a situation because this is our clients' private data. Are you
able to try reproduce the problem with this information about mapping only?
If not I will try to prepare fake example data.

Best regards.

2012/10/2 Shay Banon kimchy@gmail.com

It seems like its doing the delete by query with nested documents. Can
you share your mapping, and sample data that you index, and the
delete_by_query that you are using?

On Oct 2, 2012, at 4:16 AM, Marcin Dojwa m.dojwa@livechatinc.com wrote:

Hi,

I attached 4 different (I think) snapshots of curl
localhost:9200/_nodes/hot_threads here:
https://gist.github.com/f2826104eb5b288d3fb0

Best regards.
Marcin Dojwa

2012/9/29 Marcin Dojwa m.dojwa@livechatinc.com

Sure, I will try to reproduce this case on Monday. I updated ES to 0.19.9
so I will check if this happens with this version too.

Best regards.

2012/9/28 Shay Banon kimchy@gmail.com

Can you issue: curl localhost:9200/_nodes/hot_threads and gist the
response?

On Sep 27, 2012, at 1:19 PM, Marcin Dojwa m.dojwa@livechatinc.com
wrote:

One one thing, I am not sure if it hangs on bulk indexing or on 'delete
by query' after bulk indexing.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Fix for 1.: each bulk has 10000 inserts

Best regards.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Oh, and there is nothing in gc log.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

I am using ES 0.19.8, I want to upgrade to 0.19.9 now but I am not sure
if that will help.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Hi,

I still need help in this :slight_smile: I have the following information:

  1. 2 nodes (es1 and es2)
  2. 1 index Users with 30 shards and 1 replica

I do the following:

  1. On es2 I run consecutively many bulk index operations (each bulk has
    1000 inserts) and I do about 1000 such bulk inserts.
  2. At some point of inserting es1 uses about 300% of processor and bulk

--

--
Met vriendelijke groet,

Martijn van Groningen

--

--