ElasticSearch can't automatically recover after a big HEAP utilization

Hi guys, everything ok?

I want to talk about a problem that we are facing with our ES cluster.

Today we have four machines in our cluster, each machine has 16GB of RAM
(8GB HEAP and 8GB OS).
We have a total of 73,975,578 documents, 998 shards and 127 indices.
To index our docs we use the bulk API. Today our bulk request is made with
a total
up to 300 items. We put our docs in a queue so we can make the request in
the background.
The log below shows a piece of the information about the amount of
documents that was
sent for ES to index:

[2014-12-03 11:19:32 -0200] execute Event Create with 77 items in app 20
[2014-12-03 11:19:32 -0200] execute User Create with 1 items in app 67
[2014-12-03 11:19:40 -0200] execute User Create with 1 items in app 61
[2014-12-03 11:19:49 -0200] execute User Create with 1 items in app 62
[2014-12-03 11:19:50 -0200] execute User Create with 1 items in app 27
[2014-12-03 11:19:50 -0200] execute User Create with 2 items in app 20
[2014-12-03 11:19:54 -0200] execute User Create with 5 items in app 61
[2014-12-03 11:19:58 -0200] execute User Update with 61 items in app 20
[2014-12-03 11:20:02 -0200] execute User Create with 2 items in app 61
[2014-12-03 11:20:02 -0200] execute User Create with 1 items in app 27
[2014-12-03 11:20:10 -0200] execute User Create with 2 items in app 20
[2014-12-03 11:20:19 -0200] execute User Create with 5 items in app 61
[2014-12-03 11:20:20 -0200] execute User Create with 3 items in app 20
[2014-12-03 11:20:20 -0200] execute User Create with 1 items in app 24
[2014-12-03 11:20:25 -0200] execute User Create with 1 items in app 61
[2014-12-03 11:20:28 -0200] execute User Create with 1 items in app 20
[2014-12-03 11:20:37 -0200] execute Event Create with 91 items in app 20
[2014-12-03 11:20:42 -0200] execute User Create with 1 items in app 76
[2014-12-03 11:20:42 -0200] execute Event Create with 300 items in app 61
[2014-12-03 11:20:50 -0200] execute User Create with 4 items in app 61
[2014-12-03 11:20:51 -0200] execute User Create with 1 items in app 62
[2014-12-03 11:20:51 -0200] execute User Create with 2 items in app 20
[2014-12-03 11:20:55 -0200] execute User Create with 3 items in app 61

Sometimes the request occurs with just one item in the bulk. Another
interesting
thing is: we send that data frequently, in other words, the stress we put in
ES is pretty high.

The big problem is when ES HEAP start approaching 75% of utilization and
the GC does not reach its normal value.

This log entrance show some GC working:

[2014-12-02 21:28:04,766][WARN ][monitor.jvm ] [es-node-2]
[gc][old][43249][56] duration [48s], collections [2]/[48.2s], total
[48s]/[17.9m], memory [8.2gb]->[8.3gb]/[8.3gb], all_pools {[young]
[199.6mb]->[199.6mb]/[199.6mb]}{[survivor]
[14.1mb]->[18.9mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}
[2014-12-02 21:28:33,120][WARN ][monitor.jvm ] [es-node-2]
[gc][old][43250][57] duration [28.3s], collections [1]/[28.3s], total
[28.3s]/[18.4m], memory [8.3gb]->[8.3gb]/[8.3gb], all_pools {[young]
[199.6mb]->[199.6mb]/[199.6mb]}{[survivor]
[18.9mb]->[17.5mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}
[2014-12-02 21:29:21,222][WARN ][monitor.jvm ] [es-node-2]
[gc][old][43251][59] duration [47.9s], collections [2]/[48.1s], total
[47.9s]/[19.2m], memory [8.3gb]->[8.3gb]/[8.3gb], all_pools {[young]
[199.6mb]->[199.6mb]/[199.6mb]}{[survivor]
[17.5mb]->[21.2mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}
[2014-12-02 21:30:08,916][WARN ][monitor.jvm ] [es-node-2]
[gc][old][43252][61] duration [47.5s], collections [2]/[47.6s], total
[47.5s]/[20m], memory [8.3gb]->[8.3gb]/[8.3gb], all_pools {[young]
[199.6mb]->[199.6mb]/[199.6mb]}{[survivor]
[21.2mb]->[20.8mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}
[2014-12-02 21:30:56,208][WARN ][monitor.jvm ] [es-node-2]
[gc][old][43253][63] duration [47.1s], collections [2]/[47.2s], total
[47.1s]/[20.7m], memory [8.3gb]->[8.3gb]/[8.3gb], all_pools {[young]
[199.6mb]->[199.6mb]/[199.6mb]}{[survivor]
[20.8mb]->[24.8mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}
[2014-12-02 21:32:07,013][WARN ][transport ] [es-node-2]
Received response for a request that has timed out, sent [165744ms] ago,
timed out [8ms] ago, action [discovery/zen/fd/ping], node
[[es-node-1][sXwCdIhSRZKq7xZ6TAQiBg][localhost][inet[xxx.xxx.xxx.xxx/xxx.xxx.xxx.xxx:9300]]],
id [3002106]
[2014-12-02 21:36:41,880][WARN ][monitor.jvm ] [es-node-2]
[gc][old][43254][78] duration [5.7m], collections [15]/[5.7m], total
[5.7m]/[26.5m], memory [8.3gb]->[8.3gb]/[8.3gb], all_pools {[young]
[199.6mb]->[199.6mb]/[199.6mb]}{[survivor]
[24.8mb]->[24.4mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}

Another part that we use a lot is the ES search, these lines show some log
entrances that
were generated when the search was done

[2014-12-03 11:43:22 -0200] buscou pagina 1 de 111235 (10 por pagina) do
app 61
[2014-12-03 11:44:12 -0200] buscou pagina 1 de 30628 (10 por pagina) do app
5
[2014-12-03 11:44:13 -0200] buscou pagina 1 de 30628 (10 por pagina) do app
5
[2014-12-03 11:44:24 -0200] buscou pagina 1 de 63013 (10 por pagina) do app
20
[2014-12-03 11:44:24 -0200] buscou pagina 1 de 63013 (10 por pagina) do app
20
[2014-12-03 11:44:24 -0200] buscou pagina 1 de 63013 (10 por pagina) do app
20

These links show some screenshots that were taken to show some cluster
information:










We have already optimized some parts, the configuration of our machines is:

threadpool.index.type: fixed
threadpool.index.size: 30
threadpool.index.queue_size: 1000
threadpool.bulk.type: fixed
threadpool.bulk.size: 30
threadpool.bulk.queue_size: 1000
threadpool.search.type: fixed
threadpool.search.size: 100
threadpool.search.queue_size: 200
threadpool.get.type: fixed
threadpool.get.size: 100
threadpool.get.queue_size: 200
index.merge.policy.max_merged_segment: 2g
index.merge.policy.segments_per_tier: 5
index.merge.policy.max_merge_at_once: 5
index.cache.field.type: soft
index.cache.field.expire: 1m
index.refresh_interval: 60s
bootstrap.mlockall: true
indices.memory.index_buffer_size: '15%'
discovery.zen.minimum_master_nodes: 3
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ['xxx.xxx.xxx.xxx', 'xxx.xxx.xxx.xxx',
'xxx.xxx.xxx.xxx']

Our initial index process flows without any problem, the problems start to
happen after some days
of usage. Sometimes the cluster stays allright during 4 or 5 days and then
it shows some problems with
HEAP utilization. Are we missing any configuration or optimization?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/987994d7-fd5a-440f-ab4a-225da155605a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

What ES version, what Java version?
How much actual data?

On 5 December 2014 at 04:31, Sergio Henrique sergiohenriquetp@gmail.com
wrote:

Hi guys, everything ok?

I want to talk about a problem that we are facing with our ES cluster.

Today we have four machines in our cluster, each machine has 16GB of RAM
(8GB HEAP and 8GB OS).
We have a total of 73,975,578 documents, 998 shards and 127 indices.
To index our docs we use the bulk API. Today our bulk request is made with
a total
up to 300 items. We put our docs in a queue so we can make the request in
the background.
The log below shows a piece of the information about the amount of
documents that was
sent for ES to index:

[2014-12-03 11:19:32 -0200] execute Event Create with 77 items in app 20
[2014-12-03 11:19:32 -0200] execute User Create with 1 items in app 67
[2014-12-03 11:19:40 -0200] execute User Create with 1 items in app 61
[2014-12-03 11:19:49 -0200] execute User Create with 1 items in app 62
[2014-12-03 11:19:50 -0200] execute User Create with 1 items in app 27
[2014-12-03 11:19:50 -0200] execute User Create with 2 items in app 20
[2014-12-03 11:19:54 -0200] execute User Create with 5 items in app 61
[2014-12-03 11:19:58 -0200] execute User Update with 61 items in app 20
[2014-12-03 11:20:02 -0200] execute User Create with 2 items in app 61
[2014-12-03 11:20:02 -0200] execute User Create with 1 items in app 27
[2014-12-03 11:20:10 -0200] execute User Create with 2 items in app 20
[2014-12-03 11:20:19 -0200] execute User Create with 5 items in app 61
[2014-12-03 11:20:20 -0200] execute User Create with 3 items in app 20
[2014-12-03 11:20:20 -0200] execute User Create with 1 items in app 24
[2014-12-03 11:20:25 -0200] execute User Create with 1 items in app 61
[2014-12-03 11:20:28 -0200] execute User Create with 1 items in app 20
[2014-12-03 11:20:37 -0200] execute Event Create with 91 items in app 20
[2014-12-03 11:20:42 -0200] execute User Create with 1 items in app 76
[2014-12-03 11:20:42 -0200] execute Event Create with 300 items in app 61
[2014-12-03 11:20:50 -0200] execute User Create with 4 items in app 61
[2014-12-03 11:20:51 -0200] execute User Create with 1 items in app 62
[2014-12-03 11:20:51 -0200] execute User Create with 2 items in app 20
[2014-12-03 11:20:55 -0200] execute User Create with 3 items in app 61

Sometimes the request occurs with just one item in the bulk. Another
interesting
thing is: we send that data frequently, in other words, the stress we put
in
ES is pretty high.

The big problem is when ES HEAP start approaching 75% of utilization and
the GC does not reach its normal value.

This log entrance show some GC working:

[2014-12-02 21:28:04,766][WARN ][monitor.jvm ] [es-node-2]
[gc][old][43249][56] duration [48s], collections [2]/[48.2s], total
[48s]/[17.9m], memory [8.2gb]->[8.3gb]/[8.3gb], all_pools {[young]
[199.6mb]->[199.6mb]/[199.6mb]}{[survivor]
[14.1mb]->[18.9mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}
[2014-12-02 21:28:33,120][WARN ][monitor.jvm ] [es-node-2]
[gc][old][43250][57] duration [28.3s], collections [1]/[28.3s], total
[28.3s]/[18.4m], memory [8.3gb]->[8.3gb]/[8.3gb], all_pools {[young]
[199.6mb]->[199.6mb]/[199.6mb]}{[survivor]
[18.9mb]->[17.5mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}
[2014-12-02 21:29:21,222][WARN ][monitor.jvm ] [es-node-2]
[gc][old][43251][59] duration [47.9s], collections [2]/[48.1s], total
[47.9s]/[19.2m], memory [8.3gb]->[8.3gb]/[8.3gb], all_pools {[young]
[199.6mb]->[199.6mb]/[199.6mb]}{[survivor]
[17.5mb]->[21.2mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}
[2014-12-02 21:30:08,916][WARN ][monitor.jvm ] [es-node-2]
[gc][old][43252][61] duration [47.5s], collections [2]/[47.6s], total
[47.5s]/[20m], memory [8.3gb]->[8.3gb]/[8.3gb], all_pools {[young]
[199.6mb]->[199.6mb]/[199.6mb]}{[survivor]
[21.2mb]->[20.8mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}
[2014-12-02 21:30:56,208][WARN ][monitor.jvm ] [es-node-2]
[gc][old][43253][63] duration [47.1s], collections [2]/[47.2s], total
[47.1s]/[20.7m], memory [8.3gb]->[8.3gb]/[8.3gb], all_pools {[young]
[199.6mb]->[199.6mb]/[199.6mb]}{[survivor]
[20.8mb]->[24.8mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}
[2014-12-02 21:32:07,013][WARN ][transport ] [es-node-2]
Received response for a request that has timed out, sent [165744ms] ago,
timed out [8ms] ago, action [discovery/zen/fd/ping], node
[[es-node-1][sXwCdIhSRZKq7xZ6TAQiBg][localhost][inet[xxx.xxx.xxx.xxx/xxx.xxx.xxx.xxx:9300]]],
id [3002106]
[2014-12-02 21:36:41,880][WARN ][monitor.jvm ] [es-node-2]
[gc][old][43254][78] duration [5.7m], collections [15]/[5.7m], total
[5.7m]/[26.5m], memory [8.3gb]->[8.3gb]/[8.3gb], all_pools {[young]
[199.6mb]->[199.6mb]/[199.6mb]}{[survivor]
[24.8mb]->[24.4mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}

Another part that we use a lot is the ES search, these lines show some log
entrances that
were generated when the search was done

[2014-12-03 11:43:22 -0200] buscou pagina 1 de 111235 (10 por pagina) do
app 61
[2014-12-03 11:44:12 -0200] buscou pagina 1 de 30628 (10 por pagina) do
app 5
[2014-12-03 11:44:13 -0200] buscou pagina 1 de 30628 (10 por pagina) do
app 5
[2014-12-03 11:44:24 -0200] buscou pagina 1 de 63013 (10 por pagina) do
app 20
[2014-12-03 11:44:24 -0200] buscou pagina 1 de 63013 (10 por pagina) do
app 20
[2014-12-03 11:44:24 -0200] buscou pagina 1 de 63013 (10 por pagina) do
app 20

These links show some screenshots that were taken to show some cluster
information:

https://www.dropbox.com/s/om1lmux9oe6oeuh/Screen%20Shot%202014-12-03%20at%202.02.22%20PM.png?dl=0

https://www.dropbox.com/s/2vz9q30dmcmmam2/Screen%20Shot%202014-12-03%20at%202.02.45%20PM.png?dl=0

https://www.dropbox.com/s/qdybd4nzi04onfh/Screen%20Shot%202014-12-03%20at%202.03.17%20PM.png?dl=0

https://www.dropbox.com/s/tzvcwx513w2qik7/Screen%20Shot%202014-12-03%20at%202.03.43%20PM.png?dl=0

https://www.dropbox.com/s/6tue4vyblxtgfp2/Screen%20Shot%202014-12-03%20at%202.04.13%20PM.png?dl=0

https://www.dropbox.com/s/9ns8lnoz5z7akmk/Screen%20Shot%202014-12-03%20at%202.04.39%20PM.png?dl=0

https://www.dropbox.com/s/ruj7teo9tlj111r/Screen%20Shot%202014-12-03%20at%202.05.09%20PM.png?dl=0

https://www.dropbox.com/s/8mbzmc0fesu6oq1/Screen%20Shot%202014-12-03%20at%202.05.36%20PM.png?dl=0

https://www.dropbox.com/s/dd9w6otd6b71cw7/Screen%20Shot%202014-12-03%20at%202.06.07%20PM.png?dl=0

https://www.dropbox.com/s/rkrr9e92uirvh03/Screen%20Shot%202014-12-03%20at%202.07.08%20PM.png?dl=0

We have already optimized some parts, the configuration of our machines is:

threadpool.index.type: fixed
threadpool.index.size: 30
threadpool.index.queue_size: 1000
threadpool.bulk.type: fixed
threadpool.bulk.size: 30
threadpool.bulk.queue_size: 1000
threadpool.search.type: fixed
threadpool.search.size: 100
threadpool.search.queue_size: 200
threadpool.get.type: fixed
threadpool.get.size: 100
threadpool.get.queue_size: 200
index.merge.policy.max_merged_segment: 2g
index.merge.policy.segments_per_tier: 5
index.merge.policy.max_merge_at_once: 5
index.cache.field.type: soft
index.cache.field.expire: 1m
index.refresh_interval: 60s
bootstrap.mlockall: true
indices.memory.index_buffer_size: '15%'
discovery.zen.minimum_master_nodes: 3
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ['xxx.xxx.xxx.xxx', 'xxx.xxx.xxx.xxx',
'xxx.xxx.xxx.xxx']

Our initial index process flows without any problem, the problems start to
happen after some days
of usage. Sometimes the cluster stays allright during 4 or 5 days and then
it shows some problems with
HEAP utilization. Are we missing any configuration or optimization?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/987994d7-fd5a-440f-ab4a-225da155605a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/987994d7-fd5a-440f-ab4a-225da155605a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-QzpT6vn3KY1DQRXckv4bS663%3DfhKcF3gTS-u3sveJ-w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Mark,

The ES version is 1.3.2 and java version is "1.7.0_55" open jdk.
The total of data is 30GB.

Maybe the deleted rate is too high... and probably this is helping the
memory issues?
Today the delete rate is 16% in each node.

On Thu, Dec 4, 2014 at 7:50 PM, Mark Walkom markwalkom@gmail.com wrote:

What ES version, what Java version?
How much actual data?

On 5 December 2014 at 04:31, Sergio Henrique sergiohenriquetp@gmail.com
wrote:

Hi guys, everything ok?

I want to talk about a problem that we are facing with our ES cluster.

Today we have four machines in our cluster, each machine has 16GB of RAM
(8GB HEAP and 8GB OS).
We have a total of 73,975,578 documents, 998 shards and 127 indices.
To index our docs we use the bulk API. Today our bulk request is made
with a total
up to 300 items. We put our docs in a queue so we can make the request in
the background.
The log below shows a piece of the information about the amount of
documents that was
sent for ES to index:

[2014-12-03 11:19:32 -0200] execute Event Create with 77 items in app 20
[2014-12-03 11:19:32 -0200] execute User Create with 1 items in app 67
[2014-12-03 11:19:40 -0200] execute User Create with 1 items in app 61
[2014-12-03 11:19:49 -0200] execute User Create with 1 items in app 62
[2014-12-03 11:19:50 -0200] execute User Create with 1 items in app 27
[2014-12-03 11:19:50 -0200] execute User Create with 2 items in app 20
[2014-12-03 11:19:54 -0200] execute User Create with 5 items in app 61
[2014-12-03 11:19:58 -0200] execute User Update with 61 items in app 20
[2014-12-03 11:20:02 -0200] execute User Create with 2 items in app 61
[2014-12-03 11:20:02 -0200] execute User Create with 1 items in app 27
[2014-12-03 11:20:10 -0200] execute User Create with 2 items in app 20
[2014-12-03 11:20:19 -0200] execute User Create with 5 items in app 61
[2014-12-03 11:20:20 -0200] execute User Create with 3 items in app 20
[2014-12-03 11:20:20 -0200] execute User Create with 1 items in app 24
[2014-12-03 11:20:25 -0200] execute User Create with 1 items in app 61
[2014-12-03 11:20:28 -0200] execute User Create with 1 items in app 20
[2014-12-03 11:20:37 -0200] execute Event Create with 91 items in app 20
[2014-12-03 11:20:42 -0200] execute User Create with 1 items in app 76
[2014-12-03 11:20:42 -0200] execute Event Create with 300 items in app 61
[2014-12-03 11:20:50 -0200] execute User Create with 4 items in app 61
[2014-12-03 11:20:51 -0200] execute User Create with 1 items in app 62
[2014-12-03 11:20:51 -0200] execute User Create with 2 items in app 20
[2014-12-03 11:20:55 -0200] execute User Create with 3 items in app 61

Sometimes the request occurs with just one item in the bulk. Another
interesting
thing is: we send that data frequently, in other words, the stress we put
in
ES is pretty high.

The big problem is when ES HEAP start approaching 75% of utilization and
the GC does not reach its normal value.

This log entrance show some GC working:

[2014-12-02 21:28:04,766][WARN ][monitor.jvm ] [es-node-2]
[gc][old][43249][56] duration [48s], collections [2]/[48.2s], total
[48s]/[17.9m], memory [8.2gb]->[8.3gb]/[8.3gb], all_pools {[young]
[199.6mb]->[199.6mb]/[199.6mb]}{[survivor]
[14.1mb]->[18.9mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}
[2014-12-02 21:28:33,120][WARN ][monitor.jvm ] [es-node-2]
[gc][old][43250][57] duration [28.3s], collections [1]/[28.3s], total
[28.3s]/[18.4m], memory [8.3gb]->[8.3gb]/[8.3gb], all_pools {[young]
[199.6mb]->[199.6mb]/[199.6mb]}{[survivor]
[18.9mb]->[17.5mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}
[2014-12-02 21:29:21,222][WARN ][monitor.jvm ] [es-node-2]
[gc][old][43251][59] duration [47.9s], collections [2]/[48.1s], total
[47.9s]/[19.2m], memory [8.3gb]->[8.3gb]/[8.3gb], all_pools {[young]
[199.6mb]->[199.6mb]/[199.6mb]}{[survivor]
[17.5mb]->[21.2mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}
[2014-12-02 21:30:08,916][WARN ][monitor.jvm ] [es-node-2]
[gc][old][43252][61] duration [47.5s], collections [2]/[47.6s], total
[47.5s]/[20m], memory [8.3gb]->[8.3gb]/[8.3gb], all_pools {[young]
[199.6mb]->[199.6mb]/[199.6mb]}{[survivor]
[21.2mb]->[20.8mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}
[2014-12-02 21:30:56,208][WARN ][monitor.jvm ] [es-node-2]
[gc][old][43253][63] duration [47.1s], collections [2]/[47.2s], total
[47.1s]/[20.7m], memory [8.3gb]->[8.3gb]/[8.3gb], all_pools {[young]
[199.6mb]->[199.6mb]/[199.6mb]}{[survivor]
[20.8mb]->[24.8mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}
[2014-12-02 21:32:07,013][WARN ][transport ] [es-node-2]
Received response for a request that has timed out, sent [165744ms] ago,
timed out [8ms] ago, action [discovery/zen/fd/ping], node
[[es-node-1][sXwCdIhSRZKq7xZ6TAQiBg][localhost][inet[xxx.xxx.xxx.xxx/xxx.xxx.xxx.xxx:9300]]],
id [3002106]
[2014-12-02 21:36:41,880][WARN ][monitor.jvm ] [es-node-2]
[gc][old][43254][78] duration [5.7m], collections [15]/[5.7m], total
[5.7m]/[26.5m], memory [8.3gb]->[8.3gb]/[8.3gb], all_pools {[young]
[199.6mb]->[199.6mb]/[199.6mb]}{[survivor]
[24.8mb]->[24.4mb]/[24.9mb]}{[old] [8gb]->[8gb]/[8gb]}

Another part that we use a lot is the ES search, these lines show some
log entrances that
were generated when the search was done

[2014-12-03 11:43:22 -0200] buscou pagina 1 de 111235 (10 por pagina) do
app 61
[2014-12-03 11:44:12 -0200] buscou pagina 1 de 30628 (10 por pagina) do
app 5
[2014-12-03 11:44:13 -0200] buscou pagina 1 de 30628 (10 por pagina) do
app 5
[2014-12-03 11:44:24 -0200] buscou pagina 1 de 63013 (10 por pagina) do
app 20
[2014-12-03 11:44:24 -0200] buscou pagina 1 de 63013 (10 por pagina) do
app 20
[2014-12-03 11:44:24 -0200] buscou pagina 1 de 63013 (10 por pagina) do
app 20

These links show some screenshots that were taken to show some cluster
information:

https://www.dropbox.com/s/om1lmux9oe6oeuh/Screen%20Shot%202014-12-03%20at%202.02.22%20PM.png?dl=0

https://www.dropbox.com/s/2vz9q30dmcmmam2/Screen%20Shot%202014-12-03%20at%202.02.45%20PM.png?dl=0

https://www.dropbox.com/s/qdybd4nzi04onfh/Screen%20Shot%202014-12-03%20at%202.03.17%20PM.png?dl=0

https://www.dropbox.com/s/tzvcwx513w2qik7/Screen%20Shot%202014-12-03%20at%202.03.43%20PM.png?dl=0

https://www.dropbox.com/s/6tue4vyblxtgfp2/Screen%20Shot%202014-12-03%20at%202.04.13%20PM.png?dl=0

https://www.dropbox.com/s/9ns8lnoz5z7akmk/Screen%20Shot%202014-12-03%20at%202.04.39%20PM.png?dl=0

https://www.dropbox.com/s/ruj7teo9tlj111r/Screen%20Shot%202014-12-03%20at%202.05.09%20PM.png?dl=0

https://www.dropbox.com/s/8mbzmc0fesu6oq1/Screen%20Shot%202014-12-03%20at%202.05.36%20PM.png?dl=0

https://www.dropbox.com/s/dd9w6otd6b71cw7/Screen%20Shot%202014-12-03%20at%202.06.07%20PM.png?dl=0

https://www.dropbox.com/s/rkrr9e92uirvh03/Screen%20Shot%202014-12-03%20at%202.07.08%20PM.png?dl=0

We have already optimized some parts, the configuration of our machines
is:

threadpool.index.type: fixed
threadpool.index.size: 30
threadpool.index.queue_size: 1000
threadpool.bulk.type: fixed
threadpool.bulk.size: 30
threadpool.bulk.queue_size: 1000
threadpool.search.type: fixed
threadpool.search.size: 100
threadpool.search.queue_size: 200
threadpool.get.type: fixed
threadpool.get.size: 100
threadpool.get.queue_size: 200
index.merge.policy.max_merged_segment: 2g
index.merge.policy.segments_per_tier: 5
index.merge.policy.max_merge_at_once: 5
index.cache.field.type: soft
index.cache.field.expire: 1m
index.refresh_interval: 60s
bootstrap.mlockall: true
indices.memory.index_buffer_size: '15%'
discovery.zen.minimum_master_nodes: 3
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ['xxx.xxx.xxx.xxx', 'xxx.xxx.xxx.xxx',
'xxx.xxx.xxx.xxx']

Our initial index process flows without any problem, the problems start
to happen after some days
of usage. Sometimes the cluster stays allright during 4 or 5 days and
then it shows some problems with
HEAP utilization. Are we missing any configuration or optimization?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/987994d7-fd5a-440f-ab4a-225da155605a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/987994d7-fd5a-440f-ab4a-225da155605a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/UZqmSpzcRq4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-QzpT6vn3KY1DQRXckv4bS663%3DfhKcF3gTS-u3sveJ-w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-QzpT6vn3KY1DQRXckv4bS663%3DfhKcF3gTS-u3sveJ-w%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKr0Lwc4EMi%3DLWaYc-2%3D6xpKmpWZegASw66Zh7bju3DDXsDYMw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.