100% processor usage

Marcin_Dojwa · August 15, 2012, 3:10pm

Hi,

I have the following problem. I have 2 nodes. I started importing data into
one of them (node A). After few hours of importing process the node B
started using 100% processor and hardly answered other search and insert
requests. It imported 11GB of data. How can I investigate such problems?
Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

dadoonet · August 15, 2012, 3:18pm

Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa m.dojwa@livechatinc.com a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing data into one of them (node A). After few hours of importing process the node B started using 100% processor and hardly answered other search and insert requests. It imported 11GB of data. How can I investigate such problems? Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

Marcin_Dojwa · August 15, 2012, 3:33pm

No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr

Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa m.dojwa@livechatinc.com a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing data
into one of them (node A). After few hours of importing process the node B
started using 100% processor and hardly answered other search and insert
requests. It imported 11GB of data. How can I investigate such problems?
Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

Marcin_Dojwa · August 15, 2012, 3:33pm

But my question is still open, how to investigate such problems ?

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com

No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr

Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa m.dojwa@livechatinc.com a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing data
into one of them (node A). After few hours of importing process the node B
started using 100% processor and hardly answered other search and insert
requests. It imported 11GB of data. How can I investigate such problems?
Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

Rafal_Kuc_3 · August 15, 2012, 9:32pm

Hello Marcin!

Did you query ElasticSearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in ElasticSearch, to be
sure that the problems you are experiencing are not garbage collection
related.

There are some tools that can help you diagnosing what is happening
with your nodes - look at the front ends mentioned on ElasticSearch
clients page (http://www.elasticsearch.org/guide/appendix/clients.html).
Of course you can use the ones that are available on your operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

But my question is still open, how to investigate such problems ?

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa m.dojwa@livechatinc.com a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing data into one of them (node A). After few hours of importing process the node B started using 100% processor and hardly answered other search and insert requests. It imported 11GB of data. How can I investigate such problems? Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

Marcin_Dojwa · August 16, 2012, 5:22am

Hi,

Thank you very much for your help, I will check this. Could you tell me how
to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć r.kuc@solr.pl

Hello Marcin!

Did you query Elasticsearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in Elasticsearch, to be
sure that the problems you are experiencing are not garbage collection
related.

There are some tools that can help you diagnosing what is happening
with your nodes - look at the front ends mentioned on Elasticsearch
clients page (Elasticsearch Platform — Find real-time answers at scale | Elastic).
Of course you can use the ones that are available on your operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Elasticsearch

But my question is still open, how to investigate such problems ?

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa m.dojwa@livechatinc.com a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing data
into one of them (node A). After few hours of importing process the node B
started using 100% processor and hardly answered other search and insert
requests. It imported 11GB of data. How can I investigate such problems?
Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

--

Rafal_Kuc_3 · August 16, 2012, 7:27am

Hello!

You should have the following section in your elasticsearch.yml file:

################################## GC Logging ################################

#monitor.jvm.gc.ParNew.warn: 1000ms

#monitor.jvm.gc.ParNew.info: 700ms

#monitor.jvm.gc.ParNew.debug: 400ms

#monitor.jvm.gc.ConcurrentMarkSweep.warn: 10s

#monitor.jvm.gc.ConcurrentMarkSweep.info: 5s

#monitor.jvm.gc.ConcurrentMarkSweep.debug: 2s

Just remove the comment character, restart ElasticSearch and you will have that being logged.

--

Regards,

Rafał Kuć

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

Hi,

Thank you very much for your help, I will check this. Could you tell me how to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć <r.kuc@solr.pl>

Hello Marcin!

Did you query ElasticSearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in ElasticSearch, to be

sure that the problems you are experiencing are not garbage collection

related.

There are some tools that can help you diagnosing what is happening

with your nodes - look at the front ends mentioned on ElasticSearch

clients page (http://www.elasticsearch.org/guide/appendix/clients.html).

Of course you can use the ones that are available on your operating

system like vmstat or dstat.

--

Regards,

Rafał Kuć

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

But my question is still open, how to investigate such problems ?

Best regards.

2012/8/15 Marcin Dojwa <m.dojwa@livechatinc.com>

No, I will check this, thanks.

2012/8/15 David Pilato <david@pilato.fr>

Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

--

Le 15 août 2012 à 17:10, Marcin Dojwa <m.dojwa@livechatinc.com> a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing data into one of them (node A). After few hours of importing process the node B started using 100% processor and hardly answered other search and insert requests. It imported 11GB of data. How can I investigate such problems? Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.

Marcin Dojwa.

--

Marcin_Dojwa · August 16, 2012, 7:37am

Thank you Rafał.

Best regards
Marcin Dojwa

2012/8/16 Rafał Kuć r.kuc@solr.pl

Hello!

You should have the following section in your elasticsearch.yml file:

################################## GC Logging
################################

#monitor.jvm.gc.ParNew.warn: 1000ms
#monitor.jvm.gc.ParNew.info: 700ms
#monitor.jvm.gc.ParNew.debug: 400ms

#monitor.jvm.gc.ConcurrentMarkSweep.warn: 10s
#monitor.jvm.gc.ConcurrentMarkSweep.info: 5s
#monitor.jvm.gc.ConcurrentMarkSweep.debug: 2s

Just remove the comment character, restart Elasticsearch and you will have
that being logged.

--
Regards,
Rafał Kuć
Sematext :: *
http://sematext.com/ :: Solr - Lucene - Nutch - Elasticsearch

Hi,

Thank you very much for your help, I will check this. Could you tell me
how to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć r.kuc@solr.pl
Hello Marcin!

Did you query Elasticsearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in Elasticsearch, to be
sure that the problems you are experiencing are not garbage collection
related.

There are some tools that can help you diagnosing what is happening
with your nodes - look at the front ends mentioned on Elasticsearch
clients page (Elasticsearch Platform — Find real-time answers at scale | Elastic).
Of course you can use the ones that are available on your operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Elasticsearch

But my question is still open, how to investigate such problems ?

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa <m.dojwa@livechatinc.com

a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing data
into one of them (node A). After few hours of importing process the node B
started using 100% processor and hardly answered other search and insert
requests. It imported 11GB of data. How can I investigate such problems?
Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

--

--

--

Marcin_Dojwa · September 27, 2012, 9:23am

Hi,

I still need help in this I have the following information:

2 nodes (es1 and es2)
1 index Users with 30 shards and 1 replica

I do the following:

On es2 I run consecutively many bulk index operations (each bulk has
1000 inserts) and I do about 1000 such bulk inserts.
At some point of inserting es1 uses about 300% of processor and bulk
insert hangs. Then I stop it.
After 1 day doing nothing with any of ES nodes (I turn of everything
that uses ES) es1 uses 100% of processor and es2 is OK.
es1:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

12199 es 20 0 8847m 1.8g 11m S 102 11.7 3421:26 java

es2:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

9816 es 20 0 8771m 2.2g 11m S 1 14.4 116:41.97 java
4. http://localhost:9200/_cluster/health
{

cluster_name: "production",
status: "green",
timed_out: false,
number_of_nodes: 2,
number_of_data_nodes: 2,
active_primary_shards: 30,
active_shards: 60,
relocating_shards: 0,
initializing_shards: 0,
unassigned_shards: 0

}
5. es1: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy id
wa
1 0 0 4389668 294084 8416356 0 0 0 2 0 1 1 0
99 0
1 0 0 4389420 294084 8416356 0 0 0 0 1066 2559 16 0
84 0
1 0 0 4387312 294084 8416356 0 0 0 10 1083 2584 17 0
83 0
1 0 0 4386584 294084 8416356 0 0 0 0 1087 2552 16 0
84 0
1 0 0 4385096 294084 8416356 0 0 0 6 1102 2604 18 0
82 0

es2: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy id
wa
0 0 0 11327116 253432 1299284 0 0 1 13 16 16 0 0
100 0
0 0 0 11326984 253432 1299284 0 0 0 0 897 2154 0 0
100 0
0 0 0 11326984 253432 1299284 0 0 0 7 914 2174 0 0
100 0
0 0 0 11326984 253432 1299284 0 0 0 0 1057 2225 0 0
100 0
0 0 0 11326984 253432 1299284 0 0 0 7 874 2109 0 0
100 0
GitHub - mobz/elasticsearch-head: A web front end for an elastic search cluster - this plugin shows that
everything is OK
GitHub - lukas-vlcek/bigdesk: Live charts and statistics for Elasticsearch cluster. - this plugin shows that
everything is OK and it shows that:
OS - CPU:
Total: 100%
User: 15%
Sys: 0%

OS - LOAD:
2: 1.11
1: 1.24
0: 1.24

Process - CPU:
Total: 400%
Process: 123%

After restarting elasticsearch on es1 and stabilizing es1 works fine but
the same situation happens on es2 (100% processor usage). The only way to
fix that is stopping all ES nodes and starting them again.

Is there anything I can do to investigate this problem?

Thanks for help.

Best regards.

2012/8/16 Marcin Dojwa m.dojwa@livechatinc.com

Thank you Rafał.

Best regards
Marcin Dojwa

2012/8/16 Rafał Kuć r.kuc@solr.pl

Hello!

You should have the following section in your elasticsearch.yml file:

################################## GC Logging
################################

#monitor.jvm.gc.ParNew.warn: 1000ms
#monitor.jvm.gc.ParNew.info: 700ms
#monitor.jvm.gc.ParNew.debug: 400ms

#monitor.jvm.gc.ConcurrentMarkSweep.warn: 10s
#monitor.jvm.gc.ConcurrentMarkSweep.info: 5s
#monitor.jvm.gc.ConcurrentMarkSweep.debug: 2s

Just remove the comment character, restart Elasticsearch and you will
have that being logged.

--
Regards,
Rafał Kuć
Sematext :: *
http://sematext.com/ :: Solr - Lucene - Nutch - Elasticsearch

Hi,

Thank you very much for your help, I will check this. Could you tell me
how to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć r.kuc@solr.pl
Hello Marcin!

Did you query Elasticsearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in Elasticsearch, to be
sure that the problems you are experiencing are not garbage collection
related.

There are some tools that can help you diagnosing what is happening
with your nodes - look at the front ends mentioned on Elasticsearch
clients page (Elasticsearch Platform — Find real-time answers at scale | Elastic).
Of course you can use the ones that are available on your operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
Elasticsearch

But my question is still open, how to investigate such problems ?

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa <m.dojwa@livechatinc.com

a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing data
into one of them (node A). After few hours of importing process the node B
started using 100% processor and hardly answered other search and insert
requests. It imported 11GB of data. How can I investigate such problems?
Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

--

--

--

Marcin_Dojwa · September 27, 2012, 9:42am

I am using ES 0.19.8, I want to upgrade to 0.19.9 now but I am not sure if
that will help.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Hi,

I still need help in this I have the following information:

2 nodes (es1 and es2)

1 index Users with 30 shards and 1 replica

I do the following:

On es2 I run consecutively many bulk index operations (each bulk has
1000 inserts) and I do about 1000 such bulk inserts.

At some point of inserting es1 uses about 300% of processor and bulk
insert hangs. Then I stop it.

After 1 day doing nothing with any of ES nodes (I turn of everything
that uses ES) es1 uses 100% of processor and es2 is OK.
es1:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

12199 es 20 0 8847m 1.8g 11m S 102 11.7 3421:26 java

es2:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

9816 es 20 0 8771m 2.2g 11m S 1 14.4 116:41.97 java
4. http://localhost:9200/_cluster/health
{

cluster_name: "production",

status: "green",

timed_out: false,

number_of_nodes: 2,

number_of_data_nodes: 2,

active_primary_shards: 30,

active_shards: 60,

relocating_shards: 0,

initializing_shards: 0,

unassigned_shards: 0

}
5. es1: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy id
wa
1 0 0 4389668 294084 8416356 0 0 0 2 0 1 1 0
99 0
1 0 0 4389420 294084 8416356 0 0 0 0 1066 2559 16 0
84 0
1 0 0 4387312 294084 8416356 0 0 0 10 1083 2584 17 0
83 0
1 0 0 4386584 294084 8416356 0 0 0 0 1087 2552 16 0
84 0
1 0 0 4385096 294084 8416356 0 0 0 6 1102 2604 18 0
82 0

es2: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy id
wa
0 0 0 11327116 253432 1299284 0 0 1 13 16 16 0 0
100 0
0 0 0 11326984 253432 1299284 0 0 0 0 897 2154 0 0
100 0
0 0 0 11326984 253432 1299284 0 0 0 7 914 2174 0 0
100 0
0 0 0 11326984 253432 1299284 0 0 0 0 1057 2225 0 0
100 0
0 0 0 11326984 253432 1299284 0 0 0 7 874 2109 0 0
100 0

GitHub - mobz/elasticsearch-head: A web front end for an elastic search cluster - this plugin shows that
everything is OK

GitHub - lukas-vlcek/bigdesk: Live charts and statistics for Elasticsearch cluster. - this plugin shows that
everything is OK and it shows that:
OS - CPU:
Total: 100%
User: 15%
Sys: 0%

OS - LOAD:
2: 1.11
1: 1.24
0: 1.24

Process - CPU:
Total: 400%
Process: 123%

After restarting elasticsearch on es1 and stabilizing es1 works fine
but the same situation happens on es2 (100% processor usage). The only way
to fix that is stopping all ES nodes and starting them again.

Is there anything I can do to investigate this problem?

Thanks for help.

Best regards.

2012/8/16 Marcin Dojwa m.dojwa@livechatinc.com

Thank you Rafał.

Best regards
Marcin Dojwa

2012/8/16 Rafał Kuć r.kuc@solr.pl

Hello!

You should have the following section in your elasticsearch.yml file:

################################## GC Logging
################################

#monitor.jvm.gc.ParNew.warn: 1000ms
#monitor.jvm.gc.ParNew.info: 700ms
#monitor.jvm.gc.ParNew.debug: 400ms

#monitor.jvm.gc.ConcurrentMarkSweep.warn: 10s
#monitor.jvm.gc.ConcurrentMarkSweep.info: 5s
#monitor.jvm.gc.ConcurrentMarkSweep.debug: 2s

Just remove the comment character, restart Elasticsearch and you will
have that being logged.

--
Regards,
Rafał Kuć
Sematext :: *
http://sematext.com/ :: Solr - Lucene - Nutch - Elasticsearch

Hi,

Thank you very much for your help, I will check this. Could you tell me
how to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć r.kuc@solr.pl
Hello Marcin!

Did you query Elasticsearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in Elasticsearch, to be
sure that the problems you are experiencing are not garbage collection
related.

There are some tools that can help you diagnosing what is happening
with your nodes - look at the front ends mentioned on Elasticsearch
clients page (Elasticsearch Platform — Find real-time answers at scale | Elastic).
Of course you can use the ones that are available on your operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
Elasticsearch

But my question is still open, how to investigate such problems ?

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa <m.dojwa@livechatinc.com

a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing data
into one of them (node A). After few hours of importing process the node B
started using 100% processor and hardly answered other search and insert
requests. It imported 11GB of data. How can I investigate such problems?
Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

--

--

--

Marcin_Dojwa · September 27, 2012, 10:13am

Oh, and there is nothing in gc log.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

I am using ES 0.19.8, I want to upgrade to 0.19.9 now but I am not sure if
that will help.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Hi,

I still need help in this I have the following information:

2 nodes (es1 and es2)

1 index Users with 30 shards and 1 replica

I do the following:

On es2 I run consecutively many bulk index operations (each bulk has
1000 inserts) and I do about 1000 such bulk inserts.

At some point of inserting es1 uses about 300% of processor and bulk
insert hangs. Then I stop it.

After 1 day doing nothing with any of ES nodes (I turn of everything
that uses ES) es1 uses 100% of processor and es2 is OK.
es1:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

12199 es 20 0 8847m 1.8g 11m S 102 11.7 3421:26 java

es2:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

9816 es 20 0 8771m 2.2g 11m S 1 14.4 116:41.97 java
4. http://localhost:9200/_cluster/health
{

cluster_name: "production",

status: "green",

timed_out: false,

number_of_nodes: 2,

number_of_data_nodes: 2,

active_primary_shards: 30,

active_shards: 60,

relocating_shards: 0,

initializing_shards: 0,

unassigned_shards: 0

}
5. es1: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy
id wa
1 0 0 4389668 294084 8416356 0 0 0 2 0 1 1 0
99 0
1 0 0 4389420 294084 8416356 0 0 0 0 1066 2559 16 0
84 0
1 0 0 4387312 294084 8416356 0 0 0 10 1083 2584 17 0
83 0
1 0 0 4386584 294084 8416356 0 0 0 0 1087 2552 16 0
84 0
1 0 0 4385096 294084 8416356 0 0 0 6 1102 2604 18 0
82 0

es2: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy
id wa
0 0 0 11327116 253432 1299284 0 0 1 13 16 16 0
0 100 0
0 0 0 11326984 253432 1299284 0 0 0 0 897 2154 0
0 100 0
0 0 0 11326984 253432 1299284 0 0 0 7 914 2174 0
0 100 0
0 0 0 11326984 253432 1299284 0 0 0 0 1057 2225 0
0 100 0
0 0 0 11326984 253432 1299284 0 0 0 7 874 2109 0
0 100 0

GitHub - mobz/elasticsearch-head: A web front end for an elastic search cluster - this plugin shows that
everything is OK

GitHub - lukas-vlcek/bigdesk: Live charts and statistics for Elasticsearch cluster. - this plugin shows that
everything is OK and it shows that:
OS - CPU:
Total: 100%
User: 15%
Sys: 0%

OS - LOAD:
2: 1.11
1: 1.24
0: 1.24

Process - CPU:
Total: 400%
Process: 123%

After restarting elasticsearch on es1 and stabilizing es1 works fine
but the same situation happens on es2 (100% processor usage). The only way
to fix that is stopping all ES nodes and starting them again.

Is there anything I can do to investigate this problem?

Thanks for help.

Best regards.

2012/8/16 Marcin Dojwa m.dojwa@livechatinc.com

Thank you Rafał.

Best regards
Marcin Dojwa

2012/8/16 Rafał Kuć r.kuc@solr.pl

Hello!

You should have the following section in your elasticsearch.yml file:

################################## GC Logging
################################

#monitor.jvm.gc.ParNew.warn: 1000ms
#monitor.jvm.gc.ParNew.info: 700ms
#monitor.jvm.gc.ParNew.debug: 400ms

#monitor.jvm.gc.ConcurrentMarkSweep.warn: 10s
#monitor.jvm.gc.ConcurrentMarkSweep.info: 5s
#monitor.jvm.gc.ConcurrentMarkSweep.debug: 2s

Just remove the comment character, restart Elasticsearch and you will
have that being logged.

--
Regards,
Rafał Kuć
Sematext :: *
http://sematext.com/ :: Solr - Lucene - Nutch - Elasticsearch

Hi,

Thank you very much for your help, I will check this. Could you tell me
how to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć r.kuc@solr.pl
Hello Marcin!

Did you query Elasticsearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in Elasticsearch, to be
sure that the problems you are experiencing are not garbage collection
related.

There are some tools that can help you diagnosing what is happening
with your nodes - look at the front ends mentioned on Elasticsearch
clients page (Elasticsearch Platform — Find real-time answers at scale | Elastic
).
Of course you can use the ones that are available on your operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
Elasticsearch

But my question is still open, how to investigate such problems ?

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa <m.dojwa@livechatinc.com

a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing data
into one of them (node A). After few hours of importing process the node B
started using 100% processor and hardly answered other search and insert
requests. It imported 11GB of data. How can I investigate such problems?
Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

--

--

--

Marcin_Dojwa · September 27, 2012, 11:00am

Fix for 1.: each bulk has 10000 inserts

Best regards.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Oh, and there is nothing in gc log.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

I am using ES 0.19.8, I want to upgrade to 0.19.9 now but I am not sure
if that will help.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Hi,

I still need help in this I have the following information:

2 nodes (es1 and es2)

1 index Users with 30 shards and 1 replica

I do the following:

On es2 I run consecutively many bulk index operations (each bulk has
1000 inserts) and I do about 1000 such bulk inserts.

At some point of inserting es1 uses about 300% of processor and bulk
insert hangs. Then I stop it.

After 1 day doing nothing with any of ES nodes (I turn of everything
that uses ES) es1 uses 100% of processor and es2 is OK.
es1:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

12199 es 20 0 8847m 1.8g 11m S 102 11.7 3421:26 java

es2:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

9816 es 20 0 8771m 2.2g 11m S 1 14.4 116:41.97 java
4. http://localhost:9200/_cluster/health
{

cluster_name: "production",

status: "green",

timed_out: false,

number_of_nodes: 2,

number_of_data_nodes: 2,

active_primary_shards: 30,

active_shards: 60,

relocating_shards: 0,

initializing_shards: 0,

unassigned_shards: 0

}
5. es1: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy
id wa
1 0 0 4389668 294084 8416356 0 0 0 2 0 1 1
0 99 0
1 0 0 4389420 294084 8416356 0 0 0 0 1066 2559 16
0 84 0
1 0 0 4387312 294084 8416356 0 0 0 10 1083 2584 17
0 83 0
1 0 0 4386584 294084 8416356 0 0 0 0 1087 2552 16
0 84 0
1 0 0 4385096 294084 8416356 0 0 0 6 1102 2604 18
0 82 0

es2: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy
id wa
0 0 0 11327116 253432 1299284 0 0 1 13 16 16 0
0 100 0
0 0 0 11326984 253432 1299284 0 0 0 0 897 2154 0
0 100 0
0 0 0 11326984 253432 1299284 0 0 0 7 914 2174 0
0 100 0
0 0 0 11326984 253432 1299284 0 0 0 0 1057 2225 0
0 100 0
0 0 0 11326984 253432 1299284 0 0 0 7 874 2109 0
0 100 0

GitHub - mobz/elasticsearch-head: A web front end for an elastic search cluster - this plugin shows that
everything is OK

GitHub - lukas-vlcek/bigdesk: Live charts and statistics for Elasticsearch cluster. - this plugin shows that
everything is OK and it shows that:
OS - CPU:
Total: 100%
User: 15%
Sys: 0%

OS - LOAD:
2: 1.11
1: 1.24
0: 1.24

Process - CPU:
Total: 400%
Process: 123%

After restarting elasticsearch on es1 and stabilizing es1 works fine
but the same situation happens on es2 (100% processor usage). The only way
to fix that is stopping all ES nodes and starting them again.

Is there anything I can do to investigate this problem?

Thanks for help.

Best regards.

2012/8/16 Marcin Dojwa m.dojwa@livechatinc.com

Thank you Rafał.

Best regards
Marcin Dojwa

2012/8/16 Rafał Kuć r.kuc@solr.pl

Hello!

You should have the following section in your elasticsearch.yml file:

################################## GC Logging
################################

#monitor.jvm.gc.ParNew.warn: 1000ms
#monitor.jvm.gc.ParNew.info: 700ms
#monitor.jvm.gc.ParNew.debug: 400ms

#monitor.jvm.gc.ConcurrentMarkSweep.warn: 10s
#monitor.jvm.gc.ConcurrentMarkSweep.info: 5s
#monitor.jvm.gc.ConcurrentMarkSweep.debug: 2s

Just remove the comment character, restart Elasticsearch and you will
have that being logged.

--
Regards,
Rafał Kuć
Sematext :: *
http://sematext.com/ :: Solr - Lucene - Nutch - Elasticsearch

Hi,

Thank you very much for your help, I will check this. Could you tell
me how to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć r.kuc@solr.pl
Hello Marcin!

Did you query Elasticsearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in Elasticsearch, to be
sure that the problems you are experiencing are not garbage collection
related.

There are some tools that can help you diagnosing what is happening
with your nodes - look at the front ends mentioned on Elasticsearch
clients page (Elasticsearch Platform — Find real-time answers at scale | Elastic
).
Of course you can use the ones that are available on your operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
Elasticsearch

But my question is still open, how to investigate such problems ?

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa <m.dojwa@livechatinc.com

a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing data
into one of them (node A). After few hours of importing process the node B
started using 100% processor and hardly answered other search and insert
requests. It imported 11GB of data. How can I investigate such problems?
Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

--

--

--

Marcin_Dojwa · September 27, 2012, 11:19am

One one thing, I am not sure if it hangs on bulk indexing or on 'delete by
query' after bulk indexing.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Fix for 1.: each bulk has 10000 inserts

Best regards.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Oh, and there is nothing in gc log.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

I am using ES 0.19.8, I want to upgrade to 0.19.9 now but I am not sure
if that will help.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Hi,

I still need help in this I have the following information:

2 nodes (es1 and es2)

1 index Users with 30 shards and 1 replica

I do the following:

On es2 I run consecutively many bulk index operations (each bulk has
1000 inserts) and I do about 1000 such bulk inserts.

At some point of inserting es1 uses about 300% of processor and bulk
insert hangs. Then I stop it.

After 1 day doing nothing with any of ES nodes (I turn of everything
that uses ES) es1 uses 100% of processor and es2 is OK.
es1:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

12199 es 20 0 8847m 1.8g 11m S 102 11.7 3421:26 java

es2:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

9816 es 20 0 8771m 2.2g 11m S 1 14.4 116:41.97 java

http://localhost:9200/_cluster/health
{

cluster_name: "production",

status: "green",

timed_out: false,

number_of_nodes: 2,

number_of_data_nodes: 2,

active_primary_shards: 30,

active_shards: 60,

relocating_shards: 0,

initializing_shards: 0,

unassigned_shards: 0

}
5. es1: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy
id wa
1 0 0 4389668 294084 8416356 0 0 0 2 0 1 1
0 99 0
1 0 0 4389420 294084 8416356 0 0 0 0 1066 2559 16
0 84 0
1 0 0 4387312 294084 8416356 0 0 0 10 1083 2584 17
0 83 0
1 0 0 4386584 294084 8416356 0 0 0 0 1087 2552 16
0 84 0
1 0 0 4385096 294084 8416356 0 0 0 6 1102 2604 18
0 82 0

es2: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy
id wa
0 0 0 11327116 253432 1299284 0 0 1 13 16 16 0
0 100 0
0 0 0 11326984 253432 1299284 0 0 0 0 897 2154 0
0 100 0
0 0 0 11326984 253432 1299284 0 0 0 7 914 2174 0
0 100 0
0 0 0 11326984 253432 1299284 0 0 0 0 1057 2225 0
0 100 0
0 0 0 11326984 253432 1299284 0 0 0 7 874 2109 0
0 100 0

GitHub - mobz/elasticsearch-head: A web front end for an elastic search cluster - this plugin shows that
everything is OK

GitHub - lukas-vlcek/bigdesk: Live charts and statistics for Elasticsearch cluster. - this plugin shows that
everything is OK and it shows that:
OS - CPU:
Total: 100%
User: 15%
Sys: 0%

OS - LOAD:
2: 1.11
1: 1.24
0: 1.24

Process - CPU:
Total: 400%
Process: 123%

After restarting elasticsearch on es1 and stabilizing es1 works fine
but the same situation happens on es2 (100% processor usage). The only way
to fix that is stopping all ES nodes and starting them again.

Is there anything I can do to investigate this problem?

Thanks for help.

Best regards.

2012/8/16 Marcin Dojwa m.dojwa@livechatinc.com

Thank you Rafał.

Best regards
Marcin Dojwa

2012/8/16 Rafał Kuć r.kuc@solr.pl

Hello!

You should have the following section in your elasticsearch.yml file:

################################## GC Logging
################################

#monitor.jvm.gc.ParNew.warn: 1000ms
#monitor.jvm.gc.ParNew.info: 700ms
#monitor.jvm.gc.ParNew.debug: 400ms

#monitor.jvm.gc.ConcurrentMarkSweep.warn: 10s
#monitor.jvm.gc.ConcurrentMarkSweep.info: 5s
#monitor.jvm.gc.ConcurrentMarkSweep.debug: 2s

Just remove the comment character, restart Elasticsearch and you will
have that being logged.

--
Regards,
Rafał Kuć
Sematext :: *
http://sematext.com/ :: Solr - Lucene - Nutch - Elasticsearch

Hi,

Thank you very much for your help, I will check this. Could you tell
me how to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć r.kuc@solr.pl
Hello Marcin!

Did you query Elasticsearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in Elasticsearch, to be
sure that the problems you are experiencing are not garbage collection
related.

There are some tools that can help you diagnosing what is happening
with your nodes - look at the front ends mentioned on Elasticsearch
clients page (
Elasticsearch Platform — Find real-time answers at scale | Elastic).
Of course you can use the ones that are available on your operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
Elasticsearch

But my question is still open, how to investigate such problems ?

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa <m.dojwa@livechatinc.com

a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing
data into one of them (node A). After few hours of importing process the
node B started using 100% processor and hardly answered other search and
insert requests. It imported 11GB of data. How can I investigate such
problems? Are there any logs that can help with such situation? What should
I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

--

--

--

kimchy · September 28, 2012, 1:01pm

Can you issue: curl localhost:9200/_nodes/hot_threads and gist the response?

On Sep 27, 2012, at 1:19 PM, Marcin Dojwa m.dojwa@livechatinc.com wrote:

One one thing, I am not sure if it hangs on bulk indexing or on 'delete by query' after bulk indexing.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com
Fix for 1.: each bulk has 10000 inserts

Best regards.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com
Oh, and there is nothing in gc log.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com
I am using ES 0.19.8, I want to upgrade to 0.19.9 now but I am not sure if that will help.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com
Hi,

I still need help in this I have the following information:

2 nodes (es1 and es2)

1 index Users with 30 shards and 1 replica

I do the following:

On es2 I run consecutively many bulk index operations (each bulk has 1000 inserts) and I do about 1000 such bulk inserts.

At some point of inserting es1 uses about 300% of processor and bulk insert hangs. Then I stop it.

After 1 day doing nothing with any of ES nodes (I turn of everything that uses ES) es1 uses 100% of processor and es2 is OK.
es1:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12199 es 20 0 8847m 1.8g 11m S 102 11.7 3421:26 java
es2:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9816 es 20 0 8771m 2.2g 11m S 1 14.4 116:41.97 java

http://localhost:9200/_cluster/health
{

cluster_name: "production",

status: "green",

timed_out: false,

number_of_nodes: 2,

number_of_data_nodes: 2,

active_primary_shards: 30,

active_shards: 60,

relocating_shards: 0,

initializing_shards: 0,

unassigned_shards: 0

}
5. es1: vmstat 5 5

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 0 4389668 294084 8416356 0 0 0 2 0 1 1 0 99 0

1 0 0 4389420 294084 8416356 0 0 0 0 1066 2559 16 0 84 0
1 0 0 4387312 294084 8416356 0 0 0 10 1083 2584 17 0 83 0
1 0 0 4386584 294084 8416356 0 0 0 0 1087 2552 16 0 84 0

1 0 0 4385096 294084 8416356 0 0 0 6 1102 2604 18 0 82 0

es2: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa

0 0 0 11327116 253432 1299284 0 0 1 13 16 16 0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 0 897 2154 0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 7 914 2174 0 0 100 0

0 0 0 11326984 253432 1299284 0 0 0 0 1057 2225 0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 7 874 2109 0 0 100 0

GitHub - mobz/elasticsearch-head: A web front end for an elastic search cluster - this plugin shows that everything is OK

GitHub - lukas-vlcek/bigdesk: Live charts and statistics for Elasticsearch cluster. - this plugin shows that everything is OK and it shows that:
OS - CPU:
Total: 100%
User: 15%
Sys: 0%

OS - LOAD:
2: 1.11
1: 1.24
0: 1.24

Process - CPU:
Total: 400%
Process: 123%

After restarting elasticsearch on es1 and stabilizing es1 works fine but the same situation happens on es2 (100% processor usage). The only way to fix that is stopping all ES nodes and starting them again.

Is there anything I can do to investigate this problem?

Thanks for help.

Best regards.

2012/8/16 Marcin Dojwa m.dojwa@livechatinc.com
Thank you Rafał.

Best regards
Marcin Dojwa

2012/8/16 Rafał Kuć r.kuc@solr.pl
Hello!

You should have the following section in your elasticsearch.yml file:

################################## GC Logging ################################

#monitor.jvm.gc.ParNew.warn: 1000ms
#monitor.jvm.gc.ParNew.info: 700ms
#monitor.jvm.gc.ParNew.debug: 400ms

#monitor.jvm.gc.ConcurrentMarkSweep.warn: 10s
#monitor.jvm.gc.ConcurrentMarkSweep.info: 5s
#monitor.jvm.gc.ConcurrentMarkSweep.debug: 2s

Just remove the comment character, restart Elasticsearch and you will have that being logged.

--
Regards,
Rafał Kuć
Sematext ::
http://sematext.com/ :: Solr - Lucene - Nutch - Elasticsearch

Hi,

Thank you very much for your help, I will check this. Could you tell me how to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć r.kuc@solr.pl
Hello Marcin!

Did you query Elasticsearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in Elasticsearch, to be
sure that the problems you are experiencing are not garbage collection
related.

There are some tools that can help you diagnosing what is happening
with your nodes - look at the front ends mentioned on Elasticsearch
clients page (Elasticsearch Platform — Find real-time answers at scale | Elastic).
Of course you can use the ones that are available on your operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Elasticsearch

But my question is still open, how to investigate such problems ?

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa <m.dojwa@livechatinc.com

a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing data into one of them (node A). After few hours of importing process the node B started using 100% processor and hardly answered other search and insert requests. It imported 11GB of data. How can I investigate such problems? Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

--

--

--

--

Marcin_Dojwa · September 29, 2012, 3:44pm

Sure, I will try to reproduce this case on Monday. I updated ES to 0.19.9
so I will check if this happens with this version too.

Best regards.

2012/9/28 Shay Banon kimchy@gmail.com

Can you issue: curl localhost:9200/_nodes/hot_threads and gist the
response?

On Sep 27, 2012, at 1:19 PM, Marcin Dojwa m.dojwa@livechatinc.com wrote:

One one thing, I am not sure if it hangs on bulk indexing or on 'delete by
query' after bulk indexing.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Fix for 1.: each bulk has 10000 inserts

Best regards.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Oh, and there is nothing in gc log.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

I am using ES 0.19.8, I want to upgrade to 0.19.9 now but I am not sure
if that will help.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Hi,

I still need help in this I have the following information:

2 nodes (es1 and es2)

1 index Users with 30 shards and 1 replica

I do the following:

On es2 I run consecutively many bulk index operations (each bulk
has 1000 inserts) and I do about 1000 such bulk inserts.

At some point of inserting es1 uses about 300% of processor and
bulk insert hangs. Then I stop it.

After 1 day doing nothing with any of ES nodes (I turn of
everything that uses ES) es1 uses 100% of processor and es2 is OK.
es1:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

12199 es 20 0 8847m 1.8g 11m S 102 11.7 3421:26 java

es2:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

9816 es 20 0 8771m 2.2g 11m S 1 14.4 116:41.97 java

http://localhost:9200/_cluster/health
{

cluster_name: "production",

status: "green",

timed_out: false,

number_of_nodes: 2,

number_of_data_nodes: 2,

active_primary_shards: 30,

active_shards: 60,

relocating_shards: 0,

initializing_shards: 0,

unassigned_shards: 0

}
5. es1: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us
sy id wa
1 0 0 4389668 294084 8416356 0 0 0 2 0 1 1
0 99 0
1 0 0 4389420 294084 8416356 0 0 0 0 1066 2559 16
0 84 0
1 0 0 4387312 294084 8416356 0 0 0 10 1083 2584 17
0 83 0
1 0 0 4386584 294084 8416356 0 0 0 0 1087 2552 16
0 84 0
1 0 0 4385096 294084 8416356 0 0 0 6 1102 2604 18
0 82 0

es2: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us
sy id wa
0 0 0 11327116 253432 1299284 0 0 1 13 16 16
0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 0 897 2154
0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 7 914 2174
0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 0 1057 2225
0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 7 874 2109
0 0 100 0

GitHub - mobz/elasticsearch-head: A web front end for an elastic search cluster - this plugin shows
that everything is OK

GitHub - lukas-vlcek/bigdesk: Live charts and statistics for Elasticsearch cluster. - this plugin shows that
everything is OK and it shows that:
OS - CPU:
Total: 100%
User: 15%
Sys: 0%

OS - LOAD:
2: 1.11
1: 1.24
0: 1.24

Process - CPU:
Total: 400%
Process: 123%

After restarting elasticsearch on es1 and stabilizing es1 works
fine but the same situation happens on es2 (100% processor usage). The only
way to fix that is stopping all ES nodes and starting them again.

Is there anything I can do to investigate this problem?

Thanks for help.

Best regards.

2012/8/16 Marcin Dojwa m.dojwa@livechatinc.com

Thank you Rafał.

Best regards
Marcin Dojwa

2012/8/16 Rafał Kuć r.kuc@solr.pl

Hello!

You should have the following section in your elasticsearch.yml file:

################################## GC Logging
################################

#monitor.jvm.gc.ParNew.warn: 1000ms
#monitor.jvm.gc.ParNew.info http://monitor.jvm.gc.parnew.info/:
700ms
#monitor.jvm.gc.ParNew.debug: 400ms

#monitor.jvm.gc.ConcurrentMarkSweep.warn: 10s
#monitor.jvm.gc.ConcurrentMarkSweep.infohttp://monitor.jvm.gc.concurrentmarksweep.info/:
5s
#monitor.jvm.gc.ConcurrentMarkSweep.debug: 2s

Just remove the comment character, restart Elasticsearch and you
will have that being logged.

--
Regards,
Rafał Kuć
Sematext :: *
http://sematext.com/ :: Solr - Lucene - Nutch - Elasticsearch

Hi,

Thank you very much for your help, I will check this. Could you tell
me how to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć r.kuc@solr.pl
Hello Marcin!

Did you query Elasticsearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in Elasticsearch, to
be
sure that the problems you are experiencing are not garbage
collection
related.

There are some tools that can help you diagnosing what is happening
with your nodes - look at the front ends mentioned on Elasticsearch
clients page (
Elasticsearch Platform — Find real-time answers at scale | Elastic).
Of course you can use the ones that are available on your operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
Elasticsearch

But my question is still open, how to investigate such problems ?

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa <m.dojwa@livechatinc.com

a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing
data into one of them (node A). After few hours of importing process the
node B started using 100% processor and hardly answered other search and
insert requests. It imported 11GB of data. How can I investigate such
problems? Are there any logs that can help with such situation? What should
I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

--

--

--

--

--

Marcin_Dojwa · October 2, 2012, 8:16am

Hi,

I attached 4 different (I think) snapshots of curl
localhost:9200/_nodes/hot_threads here:

gist.github.com

https://gist.github.com/anonymous/f2826104eb5b288d3fb0

1.java

::: [es1][IiY2mhoZRFWN1ojbL-5Ejw][inet[/10.29.212.95:9300]]{rack_id=es1_rack}
   
   100.3% (501.3ms out of 500ms) cpu usage by thread 'elasticsearch[es1][index][T#3]'
     4/10 snapshots sharing following 27 elements
       org.apache.lucene.search.FilteredQuery$2.advance(FilteredQuery.java:201)
       org.elasticsearch.index.search.nested.IncludeAllChildrenQuery$IncludeAllChildrenScorer.advance(IncludeAllChildrenQuery.java:195)
       org.apache.lucene.search.FilteredQuery$2.advanceToNextCommonDoc(FilteredQuery.java:181)
       org.apache.lucene.search.FilteredQuery$2.nextDoc(FilteredQuery.java:193)
       org.apache.lucene.index.BufferedDeletesStream.applyQueryDeletes(BufferedDeletesStream.java:410)
       org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:260)

This file has been truncated. show original

2.java

::: [es1][IiY2mhoZRFWN1ojbL-5Ejw][inet[/10.29.212.95:9300]]{rack_id=es1_rack}
   
   100.3% (501.3ms out of 500ms) cpu usage by thread 'elasticsearch[es1][index][T#3]'
     10/10 snapshots sharing following 25 elements
       org.apache.lucene.search.FilteredQuery$2.advanceToNextCommonDoc(FilteredQuery.java:181)
       org.apache.lucene.search.FilteredQuery$2.nextDoc(FilteredQuery.java:193)
       org.apache.lucene.index.BufferedDeletesStream.applyQueryDeletes(BufferedDeletesStream.java:410)
       org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:260)
       org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3615)
       org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3552)

This file has been truncated. show original

3.java

::: [es1][IiY2mhoZRFWN1ojbL-5Ejw][inet[/10.29.212.95:9300]]{rack_id=es1_rack}
   
   100.3% (501.4ms out of 500ms) cpu usage by thread 'elasticsearch[es1][index][T#3]'
     10/10 snapshots sharing following 25 elements
       org.apache.lucene.search.FilteredQuery$2.advanceToNextCommonDoc(FilteredQuery.java:181)
       org.apache.lucene.search.FilteredQuery$2.nextDoc(FilteredQuery.java:193)
       org.apache.lucene.index.BufferedDeletesStream.applyQueryDeletes(BufferedDeletesStream.java:410)
       org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:260)
       org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3615)
       org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3552)

This file has been truncated. show original

There are more than three files. show original

Best regards.
Marcin Dojwa

2012/9/29 Marcin Dojwa m.dojwa@livechatinc.com

Sure, I will try to reproduce this case on Monday. I updated ES to 0.19.9
so I will check if this happens with this version too.

Best regards.

2012/9/28 Shay Banon kimchy@gmail.com

Can you issue: curl localhost:9200/_nodes/hot_threads and gist the
response?

On Sep 27, 2012, at 1:19 PM, Marcin Dojwa m.dojwa@livechatinc.com
wrote:

One one thing, I am not sure if it hangs on bulk indexing or on 'delete
by query' after bulk indexing.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Fix for 1.: each bulk has 10000 inserts

Best regards.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Oh, and there is nothing in gc log.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

I am using ES 0.19.8, I want to upgrade to 0.19.9 now but I am not
sure if that will help.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Hi,

I still need help in this I have the following information:

2 nodes (es1 and es2)

1 index Users with 30 shards and 1 replica

I do the following:

On es2 I run consecutively many bulk index operations (each bulk
has 1000 inserts) and I do about 1000 such bulk inserts.

At some point of inserting es1 uses about 300% of processor and
bulk insert hangs. Then I stop it.

After 1 day doing nothing with any of ES nodes (I turn of
everything that uses ES) es1 uses 100% of processor and es2 is OK.
es1:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

12199 es 20 0 8847m 1.8g 11m S 102 11.7 3421:26 java

es2:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

9816 es 20 0 8771m 2.2g 11m S 1 14.4 116:41.97 java

http://localhost:9200/_cluster/health
{

cluster_name: "production",

status: "green",

timed_out: false,

number_of_nodes: 2,

number_of_data_nodes: 2,

active_primary_shards: 30,

active_shards: 60,

relocating_shards: 0,

initializing_shards: 0,

unassigned_shards: 0

}
5. es1: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us
sy id wa
1 0 0 4389668 294084 8416356 0 0 0 2 0 1
1 0 99 0
1 0 0 4389420 294084 8416356 0 0 0 0 1066 2559
16 0 84 0
1 0 0 4387312 294084 8416356 0 0 0 10 1083 2584
17 0 83 0
1 0 0 4386584 294084 8416356 0 0 0 0 1087 2552
16 0 84 0
1 0 0 4385096 294084 8416356 0 0 0 6 1102 2604
18 0 82 0

es2: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us
sy id wa
0 0 0 11327116 253432 1299284 0 0 1 13 16 16
0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 0 897 2154
0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 7 914 2174
0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 0 1057 2225
0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 7 874 2109
0 0 100 0

GitHub - mobz/elasticsearch-head: A web front end for an elastic search cluster - this plugin shows
that everything is OK

GitHub - lukas-vlcek/bigdesk: Live charts and statistics for Elasticsearch cluster. - this plugin shows that
everything is OK and it shows that:
OS - CPU:
Total: 100%
User: 15%
Sys: 0%

OS - LOAD:
2: 1.11
1: 1.24
0: 1.24

Process - CPU:
Total: 400%
Process: 123%

After restarting elasticsearch on es1 and stabilizing es1 works
fine but the same situation happens on es2 (100% processor usage). The only
way to fix that is stopping all ES nodes and starting them again.

Is there anything I can do to investigate this problem?

Thanks for help.

Best regards.

2012/8/16 Marcin Dojwa m.dojwa@livechatinc.com

Thank you Rafał.

Best regards
Marcin Dojwa

2012/8/16 Rafał Kuć r.kuc@solr.pl

Hello!

You should have the following section in your elasticsearch.yml
file:

################################## GC Logging
################################

#monitor.jvm.gc.ParNew.warn: 1000ms
#monitor.jvm.gc.ParNew.info http://monitor.jvm.gc.parnew.info/:
700ms
#monitor.jvm.gc.ParNew.debug: 400ms

#monitor.jvm.gc.ConcurrentMarkSweep.warn: 10s
#monitor.jvm.gc.ConcurrentMarkSweep.infohttp://monitor.jvm.gc.concurrentmarksweep.info/:
5s
#monitor.jvm.gc.ConcurrentMarkSweep.debug: 2s

Just remove the comment character, restart Elasticsearch and you
will have that being logged.

--
Regards,
Rafał Kuć
Sematext :: *
http://sematext.com/ :: Solr - Lucene - Nutch - Elasticsearch

Hi,

Thank you very much for your help, I will check this. Could you
tell me how to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć r.kuc@solr.pl
Hello Marcin!

Did you query Elasticsearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in Elasticsearch, to
be
sure that the problems you are experiencing are not garbage
collection
related.

There are some tools that can help you diagnosing what is happening
with your nodes - look at the front ends mentioned on Elasticsearch
clients page (
Elasticsearch Platform — Find real-time answers at scale | Elastic).
Of course you can use the ones that are available on your operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
Elasticsearch

But my question is still open, how to investigate such problems ?

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa <m.dojwa@livechatinc.com

a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing
data into one of them (node A). After few hours of importing process the
node B started using 100% processor and hardly answered other search and
insert requests. It imported 11GB of data. How can I investigate such
problems? Are there any logs that can help with such situation? What should
I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

--

--

--

--

--

kimchy · October 2, 2012, 6:35pm

It seems like its doing the delete by query with nested documents. Can you share your mapping, and sample data that you index, and the delete_by_query that you are using?

On Oct 2, 2012, at 4:16 AM, Marcin Dojwa m.dojwa@livechatinc.com wrote:

Hi,

I attached 4 different (I think) snapshots of curl localhost:9200/_nodes/hot_threads here: ES 100% processor usage · GitHub

Best regards.
Marcin Dojwa

2012/9/29 Marcin Dojwa m.dojwa@livechatinc.com
Sure, I will try to reproduce this case on Monday. I updated ES to 0.19.9 so I will check if this happens with this version too.

Best regards.

2012/9/28 Shay Banon kimchy@gmail.com
Can you issue: curl localhost:9200/_nodes/hot_threads and gist the response?

On Sep 27, 2012, at 1:19 PM, Marcin Dojwa m.dojwa@livechatinc.com wrote:

One one thing, I am not sure if it hangs on bulk indexing or on 'delete by query' after bulk indexing.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com
Fix for 1.: each bulk has 10000 inserts

Best regards.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com
Oh, and there is nothing in gc log.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com
I am using ES 0.19.8, I want to upgrade to 0.19.9 now but I am not sure if that will help.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com
Hi,

I still need help in this I have the following information:

2 nodes (es1 and es2)

1 index Users with 30 shards and 1 replica

I do the following:

On es2 I run consecutively many bulk index operations (each bulk has 1000 inserts) and I do about 1000 such bulk inserts.

At some point of inserting es1 uses about 300% of processor and bulk insert hangs. Then I stop it.

After 1 day doing nothing with any of ES nodes (I turn of everything that uses ES) es1 uses 100% of processor and es2 is OK.
es1:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12199 es 20 0 8847m 1.8g 11m S 102 11.7 3421:26 java
es2:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
9816 es 20 0 8771m 2.2g 11m S 1 14.4 116:41.97 java

http://localhost:9200/_cluster/health
{

cluster_name: "production",

status: "green",

timed_out: false,

number_of_nodes: 2,

number_of_data_nodes: 2,

active_primary_shards: 30,

active_shards: 60,

relocating_shards: 0,

initializing_shards: 0,

unassigned_shards: 0

}
5. es1: vmstat 5 5

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
1 0 0 4389668 294084 8416356 0 0 0 2 0 1 1 0 99 0

1 0 0 4389420 294084 8416356 0 0 0 0 1066 2559 16 0 84 0
1 0 0 4387312 294084 8416356 0 0 0 10 1083 2584 17 0 83 0
1 0 0 4386584 294084 8416356 0 0 0 0 1087 2552 16 0 84 0

1 0 0 4385096 294084 8416356 0 0 0 6 1102 2604 18 0 82 0

es2: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa

0 0 0 11327116 253432 1299284 0 0 1 13 16 16 0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 0 897 2154 0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 7 914 2174 0 0 100 0

0 0 0 11326984 253432 1299284 0 0 0 0 1057 2225 0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 7 874 2109 0 0 100 0

GitHub - mobz/elasticsearch-head: A web front end for an elastic search cluster - this plugin shows that everything is OK

GitHub - lukas-vlcek/bigdesk: Live charts and statistics for Elasticsearch cluster. - this plugin shows that everything is OK and it shows that:
OS - CPU:
Total: 100%
User: 15%
Sys: 0%

OS - LOAD:
2: 1.11
1: 1.24
0: 1.24

Process - CPU:
Total: 400%
Process: 123%

After restarting elasticsearch on es1 and stabilizing es1 works fine but the same situation happens on es2 (100% processor usage). The only way to fix that is stopping all ES nodes and starting them again.

Is there anything I can do to investigate this problem?

Thanks for help.

Best regards.

2012/8/16 Marcin Dojwa m.dojwa@livechatinc.com
Thank you Rafał.

Best regards
Marcin Dojwa

2012/8/16 Rafał Kuć r.kuc@solr.pl
Hello!

You should have the following section in your elasticsearch.yml file:

################################## GC Logging ################################

#monitor.jvm.gc.ParNew.warn: 1000ms
#monitor.jvm.gc.ParNew.info: 700ms
#monitor.jvm.gc.ParNew.debug: 400ms

#monitor.jvm.gc.ConcurrentMarkSweep.warn: 10s
#monitor.jvm.gc.ConcurrentMarkSweep.info: 5s
#monitor.jvm.gc.ConcurrentMarkSweep.debug: 2s

Just remove the comment character, restart Elasticsearch and you will have that being logged.

--
Regards,
Rafał Kuć
Sematext ::
http://sematext.com/ :: Solr - Lucene - Nutch - Elasticsearch

Hi,

Thank you very much for your help, I will check this. Could you tell me how to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć r.kuc@solr.pl
Hello Marcin!

Did you query Elasticsearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in Elasticsearch, to be
sure that the problems you are experiencing are not garbage collection
related.

There are some tools that can help you diagnosing what is happening
with your nodes - look at the front ends mentioned on Elasticsearch
clients page (Elasticsearch Platform — Find real-time answers at scale | Elastic).
Of course you can use the ones that are available on your operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Elasticsearch

But my question is still open, how to investigate such problems ?

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa <m.dojwa@livechatinc.com

a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing data into one of them (node A). After few hours of importing process the node B started using 100% processor and hardly answered other search and insert requests. It imported 11GB of data. How can I investigate such problems? Are there any logs that can help with such situation? What should I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

--

--

--

--

--

--

Marcin_Dojwa · October 3, 2012, 8:50am

Hi,

The exact mapping is here: ES 100% processor usage · GitHub
I guess that this is about chat document type because this is the only one
with nested mapping. I can not give you exact data that could possibly
cause such a situation because this is our clients' private data. Are you
able to try reproduce the problem with this information about mapping only?
If not I will try to prepare fake example data.

Best regards.

2012/10/2 Shay Banon kimchy@gmail.com

It seems like its doing the delete by query with nested documents. Can you
share your mapping, and sample data that you index, and the delete_by_query
that you are using?

On Oct 2, 2012, at 4:16 AM, Marcin Dojwa m.dojwa@livechatinc.com wrote:

Hi,

I attached 4 different (I think) snapshots of curl
localhost:9200/_nodes/hot_threads here:
ES 100% processor usage · GitHub

Best regards.
Marcin Dojwa

2012/9/29 Marcin Dojwa m.dojwa@livechatinc.com

Sure, I will try to reproduce this case on Monday. I updated ES to 0.19.9
so I will check if this happens with this version too.

Best regards.

2012/9/28 Shay Banon kimchy@gmail.com

Can you issue: curl localhost:9200/_nodes/hot_threads and gist the
response?

On Sep 27, 2012, at 1:19 PM, Marcin Dojwa m.dojwa@livechatinc.com
wrote:

One one thing, I am not sure if it hangs on bulk indexing or on 'delete
by query' after bulk indexing.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Fix for 1.: each bulk has 10000 inserts

Best regards.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Oh, and there is nothing in gc log.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

I am using ES 0.19.8, I want to upgrade to 0.19.9 now but I am not
sure if that will help.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Hi,

I still need help in this I have the following information:

2 nodes (es1 and es2)

1 index Users with 30 shards and 1 replica

I do the following:

On es2 I run consecutively many bulk index operations (each bulk
has 1000 inserts) and I do about 1000 such bulk inserts.

At some point of inserting es1 uses about 300% of processor and
bulk insert hangs. Then I stop it.

After 1 day doing nothing with any of ES nodes (I turn of
everything that uses ES) es1 uses 100% of processor and es2 is OK.
es1:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

12199 es 20 0 8847m 1.8g 11m S 102 11.7 3421:26 java

es2:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

9816 es 20 0 8771m 2.2g 11m S 1 14.4 116:41.97 java

http://localhost:9200/_cluster/health
{

cluster_name: "production",

status: "green",

timed_out: false,

number_of_nodes: 2,

number_of_data_nodes: 2,

active_primary_shards: 30,

active_shards: 60,

relocating_shards: 0,

initializing_shards: 0,

unassigned_shards: 0

}
5. es1: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us
sy id wa
1 0 0 4389668 294084 8416356 0 0 0 2 0 1
1 0 99 0
1 0 0 4389420 294084 8416356 0 0 0 0 1066 2559
16 0 84 0
1 0 0 4387312 294084 8416356 0 0 0 10 1083 2584
17 0 83 0
1 0 0 4386584 294084 8416356 0 0 0 0 1087 2552
16 0 84 0
1 0 0 4385096 294084 8416356 0 0 0 6 1102 2604
18 0 82 0

es2: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us
sy id wa
0 0 0 11327116 253432 1299284 0 0 1 13 16 16
0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 0 897 2154
0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 7 914 2174
0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 0 1057 2225
0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 7 874 2109
0 0 100 0

GitHub - mobz/elasticsearch-head: A web front end for an elastic search cluster - this plugin shows
that everything is OK

GitHub - lukas-vlcek/bigdesk: Live charts and statistics for Elasticsearch cluster. - this plugin shows that
everything is OK and it shows that:
OS - CPU:
Total: 100%
User: 15%
Sys: 0%

OS - LOAD:
2: 1.11
1: 1.24
0: 1.24

Process - CPU:
Total: 400%
Process: 123%

After restarting elasticsearch on es1 and stabilizing es1 works
fine but the same situation happens on es2 (100% processor usage). The only
way to fix that is stopping all ES nodes and starting them again.

Is there anything I can do to investigate this problem?

Thanks for help.

Best regards.

2012/8/16 Marcin Dojwa m.dojwa@livechatinc.com

Thank you Rafał.

Best regards
Marcin Dojwa

2012/8/16 Rafał Kuć r.kuc@solr.pl

Hello!

You should have the following section in your elasticsearch.yml
file:

################################## GC Logging
################################

#monitor.jvm.gc.ParNew.warn: 1000ms
#monitor.jvm.gc.ParNew.info http://monitor.jvm.gc.parnew.info/:
700ms
#monitor.jvm.gc.ParNew.debug: 400ms

#monitor.jvm.gc.ConcurrentMarkSweep.warn: 10s
#monitor.jvm.gc.ConcurrentMarkSweep.infohttp://monitor.jvm.gc.concurrentmarksweep.info/:
5s
#monitor.jvm.gc.ConcurrentMarkSweep.debug: 2s

Just remove the comment character, restart Elasticsearch and you
will have that being logged.

--
Regards,
Rafał Kuć
Sematext :: *
http://sematext.com/ :: Solr - Lucene - Nutch - Elasticsearch

Hi,

Thank you very much for your help, I will check this. Could you
tell me how to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć r.kuc@solr.pl
Hello Marcin!

Did you query Elasticsearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in Elasticsearch,
to be
sure that the problems you are experiencing are not garbage
collection
related.

There are some tools that can help you diagnosing what is happening
with your nodes - look at the front ends mentioned on Elasticsearch
clients page (
Elasticsearch Platform — Find real-time answers at scale | Elastic).
Of course you can use the ones that are available on your operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
Elasticsearch

But my question is still open, how to investigate such problems ?

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa <m.dojwa@livechatinc.com

a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing
data into one of them (node A). After few hours of importing process the
node B started using 100% processor and hardly answered other search and
insert requests. It imported 11GB of data. How can I investigate such
problems? Are there any logs that can help with such situation? What should
I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

--

--

--

--

--

--

--

mvg · October 3, 2012, 3:05pm

Hi Marcin,

Are you using one or more aliases when executing the delete by query?

Martijn

On 3 October 2012 10:50, Marcin Dojwa m.dojwa@livechatinc.com wrote:

Hi,

The exact mapping is here: ES 100% processor usage · GitHub
I guess that this is about chat document type because this is the only one
with nested mapping. I can not give you exact data that could possibly
cause such a situation because this is our clients' private data. Are you
able to try reproduce the problem with this information about mapping only?
If not I will try to prepare fake example data.

Best regards.

2012/10/2 Shay Banon kimchy@gmail.com

It seems like its doing the delete by query with nested documents. Can
you share your mapping, and sample data that you index, and the
delete_by_query that you are using?

On Oct 2, 2012, at 4:16 AM, Marcin Dojwa m.dojwa@livechatinc.com wrote:

Hi,

I attached 4 different (I think) snapshots of curl
localhost:9200/_nodes/hot_threads here:
ES 100% processor usage · GitHub

Best regards.
Marcin Dojwa

2012/9/29 Marcin Dojwa m.dojwa@livechatinc.com

Sure, I will try to reproduce this case on Monday. I updated ES to
0.19.9 so I will check if this happens with this version too.

Best regards.

2012/9/28 Shay Banon kimchy@gmail.com

Can you issue: curl localhost:9200/_nodes/hot_threads and gist the
response?

On Sep 27, 2012, at 1:19 PM, Marcin Dojwa m.dojwa@livechatinc.com
wrote:

One one thing, I am not sure if it hangs on bulk indexing or on 'delete
by query' after bulk indexing.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Fix for 1.: each bulk has 10000 inserts

Best regards.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Oh, and there is nothing in gc log.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

I am using ES 0.19.8, I want to upgrade to 0.19.9 now but I am not
sure if that will help.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Hi,

I still need help in this I have the following information:

2 nodes (es1 and es2)

1 index Users with 30 shards and 1 replica

I do the following:

On es2 I run consecutively many bulk index operations (each bulk
has 1000 inserts) and I do about 1000 such bulk inserts.

At some point of inserting es1 uses about 300% of processor and
bulk insert hangs. Then I stop it.

After 1 day doing nothing with any of ES nodes (I turn of
everything that uses ES) es1 uses 100% of processor and es2 is OK.
es1:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

12199 es 20 0 8847m 1.8g 11m S 102 11.7 3421:26 java

es2:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND

9816 es 20 0 8771m 2.2g 11m S 1 14.4 116:41.97 java

http://localhost:9200/_cluster/health
{

cluster_name: "production",

status: "green",

timed_out: false,

number_of_nodes: 2,

number_of_data_nodes: 2,

active_primary_shards: 30,

active_shards: 60,

relocating_shards: 0,

initializing_shards: 0,

unassigned_shards: 0

}
5. es1: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs
us sy id wa
1 0 0 4389668 294084 8416356 0 0 0 2 0 1
1 0 99 0
1 0 0 4389420 294084 8416356 0 0 0 0 1066 2559
16 0 84 0
1 0 0 4387312 294084 8416356 0 0 0 10 1083 2584
17 0 83 0
1 0 0 4386584 294084 8416356 0 0 0 0 1087 2552
16 0 84 0
1 0 0 4385096 294084 8416356 0 0 0 6 1102 2604
18 0 82 0

es2: vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs
us sy id wa
0 0 0 11327116 253432 1299284 0 0 1 13 16
16 0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 0 897
2154 0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 7 914
2174 0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 0 1057
2225 0 0 100 0
0 0 0 11326984 253432 1299284 0 0 0 7 874
2109 0 0 100 0

GitHub - mobz/elasticsearch-head: A web front end for an elastic search cluster - this plugin shows
that everything is OK

GitHub - lukas-vlcek/bigdesk: Live charts and statistics for Elasticsearch cluster. - this plugin shows that
everything is OK and it shows that:
OS - CPU:
Total: 100%
User: 15%
Sys: 0%

OS - LOAD:
2: 1.11
1: 1.24
0: 1.24

Process - CPU:
Total: 400%
Process: 123%

After restarting elasticsearch on es1 and stabilizing es1 works
fine but the same situation happens on es2 (100% processor usage). The only
way to fix that is stopping all ES nodes and starting them again.

Is there anything I can do to investigate this problem?

Thanks for help.

Best regards.

2012/8/16 Marcin Dojwa m.dojwa@livechatinc.com

Thank you Rafał.

Best regards
Marcin Dojwa

2012/8/16 Rafał Kuć r.kuc@solr.pl

Hello!

You should have the following section in your elasticsearch.yml
file:

################################## GC Logging
################################

#monitor.jvm.gc.ParNew.warn: 1000ms
#monitor.jvm.gc.ParNew.info http://monitor.jvm.gc.parnew.info/:
700ms
#monitor.jvm.gc.ParNew.debug: 400ms

#monitor.jvm.gc.ConcurrentMarkSweep.warn: 10s
#monitor.jvm.gc.ConcurrentMarkSweep.infohttp://monitor.jvm.gc.concurrentmarksweep.info/:
5s
#monitor.jvm.gc.ConcurrentMarkSweep.debug: 2s

Just remove the comment character, restart Elasticsearch and you
will have that being logged.

--
Regards,
Rafał Kuć
Sematext :: *
http://sematext.com/ :: Solr - Lucene - Nutch - Elasticsearch

Hi,

Thank you very much for your help, I will check this. Could you
tell me how to enable GC logging in ES ? Thanks.

Best regards.

2012/8/15 Rafał Kuć r.kuc@solr.pl
Hello Marcin!

Did you query Elasticsearch or was it only indexing data ?

Anyway, first of all I would enable GC logging in Elasticsearch,
to be
sure that the problems you are experiencing are not garbage
collection
related.

There are some tools that can help you diagnosing what is
happening
with your nodes - look at the front ends mentioned on
Elasticsearch
clients page (
Elasticsearch Platform — Find real-time answers at scale | Elastic).
Of course you can use the ones that are available on your
operating
system like vmstat or dstat.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
Elasticsearch

But my question is still open, how to investigate such problems ?

Best regards.

2012/8/15 Marcin Dojwa m.dojwa@livechatinc.com
No, I will check this, thanks.

2012/8/15 David Pilato david@pilato.fr
Have you looked at the Leap second problem?

http://elasticsearch-users.115913.n3.nabble.com/leapsecond-and-elasticsearch-td4019953.html

David

Le 15 août 2012 à 17:10, Marcin Dojwa <m.dojwa@livechatinc.com

a écrit :

Hi,

I have the following problem. I have 2 nodes. I started importing
data into one of them (node A). After few hours of importing process the
node B started using 100% processor and hardly answered other search and
insert requests. It imported 11GB of data. How can I investigate such
problems? Are there any logs that can help with such situation? What should
I do ?

Thank you for help.

Best regards.
Marcin Dojwa.

--

--

--

--

--

--

--

--

--

--

--
Met vriendelijke groet,

Martijn van Groningen

--

Marcin_Dojwa · October 3, 2012, 8:28pm

More, I have 1 index and about 50000 aliases to this index.

W dniu środa, 3 października 2012 użytkownik Martijn v Groningen napisał:

Hi Marcin,

Are you using one or more aliases when executing the delete by query?

Martijn

On 3 October 2012 10:50, Marcin Dojwa <m.dojwa@livechatinc.com<javascript:_e({}, 'cvml', 'm.dojwa@livechatinc.com');>

wrote:

Hi,

The exact mapping is here: ES 100% processor usage · GitHub
I guess that this is about chat document type because this is the only
one with nested mapping. I can not give you exact data that could possibly
cause such a situation because this is our clients' private data. Are you
able to try reproduce the problem with this information about mapping only?
If not I will try to prepare fake example data.

Best regards.

2012/10/2 Shay Banon kimchy@gmail.com

It seems like its doing the delete by query with nested documents. Can
you share your mapping, and sample data that you index, and the
delete_by_query that you are using?

On Oct 2, 2012, at 4:16 AM, Marcin Dojwa m.dojwa@livechatinc.com wrote:

Hi,

I attached 4 different (I think) snapshots of curl
localhost:9200/_nodes/hot_threads here:
ES 100% processor usage · GitHub

Best regards.
Marcin Dojwa

2012/9/29 Marcin Dojwa m.dojwa@livechatinc.com

Sure, I will try to reproduce this case on Monday. I updated ES to 0.19.9
so I will check if this happens with this version too.

Best regards.

2012/9/28 Shay Banon kimchy@gmail.com

Can you issue: curl localhost:9200/_nodes/hot_threads and gist the
response?

On Sep 27, 2012, at 1:19 PM, Marcin Dojwa m.dojwa@livechatinc.com
wrote:

One one thing, I am not sure if it hangs on bulk indexing or on 'delete
by query' after bulk indexing.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Fix for 1.: each bulk has 10000 inserts

Best regards.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Oh, and there is nothing in gc log.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

I am using ES 0.19.8, I want to upgrade to 0.19.9 now but I am not sure
if that will help.

2012/9/27 Marcin Dojwa m.dojwa@livechatinc.com

Hi,

I still need help in this I have the following information:

2 nodes (es1 and es2)

1 index Users with 30 shards and 1 replica

I do the following:

On es2 I run consecutively many bulk index operations (each bulk has
1000 inserts) and I do about 1000 such bulk inserts.

At some point of inserting es1 uses about 300% of processor and bulk

--

--
Met vriendelijke groet,

Martijn van Groningen

--

--