my cluster - running 1.1.2 with oracle java 1.7_55 - dies several days a
week
one the nodes gets "disconnected"..
this time one of them logged: observer timed out. notifying listener.
several times
I then have to restart them all - this time I had several who thought they
were masters
they are physical machines on the same LAN
the machines (4) index between 70 and 130k documents (lines from logstash)
per minute. I input into different indexes (not just logstash-$date)
after having a queue built up - they will easily index 260 to 290k/min -
until the queue is emptied
so they seem to have no resource shortage, but they somehow "get tired and
die"- very often
any ideas how I should proceed in debugging this issue? It seems to fit
that EVERY time a node dies - I have garbage collection log entries. This
time I had these on one node:
[2014-07-12 17:12:25,132][INFO ][monitor.jvm ]
[p-elasticlog02] [gc][young][194961][25004] duration [767ms], collections
[1]/[1s], total [767ms]/[34.3m], memory [27.7gb]->[26.7gb]/[31.7gb],
all_pools {[young] [1gb]->[5.3mb]/[1.1
gb]}{[survivor] [149.7mb]->[124.1mb]/[149.7mb]}{[old]
[26.5gb]->[26.6gb]/[30.4gb]}
[2014-07-12 17:12:44,929][INFO ][monitor.jvm ]
[p-elasticlog02] [gc][young][194980][25007] duration [804ms], collections
[1]/[1.1s], total [804ms]/[34.3m], memory [27.8gb]->[26.9gb]/[31.7gb],
all_pools {[young] [1gb]->[11.2mb]/[
1.1gb]}{[survivor] [149.7mb]->[146.1mb]/[149.7mb]}{[old]
[26.6gb]->[26.7gb]/[30.4gb]}
[2014-07-12 17:14:57,032][INFO ][monitor.jvm ]
[p-elasticlog02] [gc][young][195109][25035] duration [837ms], collections
[1]/[1s], total [837ms]/[34.4m], memory [28.9gb]->[28.1gb]/[31.7gb],
all_pools {[young] [1gb]->[141.1mb]/[1
.1gb]}{[survivor] [149.7mb]->[145.7mb]/[149.7mb]}{[old]
[27.7gb]->[27.8gb]/[30.4gb]}
[2014-07-12 17:16:17,016][INFO ][monitor.jvm ]
[p-elasticlog02] [gc][young][195187][25053] duration [756ms], collections
[1]/[1.4s], total [756ms]/[34.5m], memory [29.5gb]->[28.7gb]/[31.7gb],
all_pools {[young] [926.8mb]->[27.6m
b]/[1.1gb]}{[survivor] [149.7mb]->[138.6mb]/[149.7mb]}{[old]
[28.4gb]->[28.5gb]/[30.4gb]}
[2014-07-12 17:18:57,313][WARN ][monitor.jvm ]
[p-elasticlog02] [gc][young][195303][25075] duration [1.1s], collections
[1]/[40.6s], total [1.1s]/[34.6m], memory [30.4gb]->[29.2gb]/[31.7gb],
all_pools {[young] [1.1gb]->[12.3mb]/
[1.1gb]}{[survivor] [149.7mb]->[0b]/[149.7mb]}{[old]
[29.1gb]->[29.2gb]/[30.4gb]}
[2014-07-12 17:18:57,314][WARN ][monitor.jvm ]
[p-elasticlog02] [gc][old][195303][53] duration [39.1s], collections
[2]/[40.6s], total [39.1s]/[1.5m], memory [30.4gb]->[29.2gb]/[31.7gb],
all_pools {[young] [1.1gb]->[12.3mb]/[1.1
gb]}{[survivor] [149.7mb]->[0b]/[149.7mb]}{[old]
[29.1gb]->[29.2gb]/[30.4gb]}
I collect a lot of counters from elasticsearch (Using elasticsearch
collector in diamond (graphite collector written in python)) - so I have
data on the ES nodes.
my config is this:
index.warmer.enabled: false
cluster.name: elasticsearch
node.name: "p-elasticlog02"
node.master: true
node.data: true
action.disable_delete_all_indices: true
indices.memory.index_buffer_size: 50%
indices.fielddata.cache.size: 30%
index.refresh_interval: 5s
index.index_concurrency: 16
threadpool.search.type: fixed
threadpool.search.size: 400
threadpool.search.queue_size: 900
threadpool.bulk.type: fixed
threadpool.bulk.size: 500
threadpool.bulk.queue_size: 900
threadpool.index.type: fixed
threadpool.index.size: 300
threadpool.index.queue_size: -1
path.data: /var/lib/elasticsearch/
bootstrap.mlockall: true
network.publish_host: $hostip
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["p-elasticlog01.example.idk",
"p-elasticlog02.example.idk", "p-elasticlog03.example.idk",
"p-elasticlog04.example.idk", "p-elasticlog05.example.idk"]
I have 24 cores in each machine (doing pretty much nothing) - so I was
considering trying to switch to g1gc f.ex. - as I've read it should be
better in some respects.. any input?
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f485903b-b34c-4d58-bbf0-9a60a3866afe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.