Hi all
We're running a three node elasticsearch cluster (two data nodes, one
dataless) and using it to store data from logstash.
Every week or two, we see the following message in the elasticsearch logs:
[24.8gb]->[24.5gb]/[24.8gb], all_pools {[young]
[865.3mb]->[586mb]/[865.3mb]}{[survivor] [102.5mb]->[0b]/[108.1mb]}{[old]
[23.9gb]->[23.9gb]/[23.9gb]}
[2014-11-17 15:26:15,066][WARN ][monitor.jvm ] [es-prod-2]
[gc][old][1189982][81480] duration [14.9s], collections [1]/[15.7s], total
[14.9s]/[16.1h], memory
[24.5gb]->[24.5gb]/[24.8gb], all_pools {[young]
[586mb]->[592.5mb]/[865.3mb]}{[survivor] [0b]->[0b]/[108.1mb]}{[old]
[23.9gb]->[23.9gb]/[23.9gb]}
[2014-11-17 15:26:30,715][WARN ][monitor.jvm ] [es-prod-2]
[gc][old][1189983][81481] duration [14.6s], collections [1]/[15.6s], total
[14.6s]/[16.1h], memory
[24.5gb]->[24.5gb]/[24.8gb], all_pools {[young]
[592.5mb]->[589.1mb]/[865.3mb]}{[survivor] [0b]->[0b]/[108.1mb]}{[old]
[23.9gb]->[23.9gb]/[23.9gb]}
[2014-11-17 15:26:46,705][WARN ][monitor.jvm ] [es-prod-2]
[gc][old][1189984][81482] duration [15.2s], collections [1]/[15.9s], total
[15.2s]/[16.1h], memory
[24.5gb]->[24.3gb]/[24.8gb], all_pools {[young]
[589.1mb]->[445.2mb]/[865.3mb]}{[survivor] [0b]->[0b]/[108.1mb]}{[old]
[23.9gb]->[23.9gb]/[23.9gb]}
[2014-11-17 15:27:03,630][WARN ][monitor.jvm ] [es-prod-2]
[gc][old][1189986][81483] duration [15.8s], collections [1]/[15.9s], total
[15.8s]/[16.1h], memory
[24.8gb]->[24.3gb]/[24.8gb], all_pools {[young]
[865.3mb]->[461.7mb]/[865.3mb]}{[survivor] [91.8mb]->[0b]/[108.1mb]}{[old]
[23.9gb]->[23.9gb]/[23.9gb]}
When this occurs, search performance becomes very slow. Even a simple $ curl http://es-prod-2:9200
can take around ten seconds.
The daily indexes created by logstash vary between 5M and 80M documents,
and 1.5GiB and 25GiB on disk. The data nodes have ES_HEAP_SIZE=25G (we saw
OOM errors with 15G and going over 30GiB is not recommended I believe).
I suspect this occurs when users try to query over a large numbers of
indexes in Kibana.
My questions are:
1: How should I tune our cluster to handle these queries? Is our dataset
simply too big?
2: When this happens, I restart the bad node by:
curl -XPUT "http://$HOST:$PORT/_cluster/settings?pretty" -d ' {
"transient": {
"cluster.routing.allocation.enable": "none"
}
}'
curl -XPUT "http://$HOST:$PORT/_cluster/settings?pretty" -d '{
"transient" : {
"cluster.routing.allocation.enable" : "none"
}
}'
(start the node again)
curl -XPUT "http://$HOST:$PORT/_cluster/settings?pretty" -d ' {
"transient": {
"cluster.routing.allocation.enable": "all"
}
}'
It's then an hour or two before the cluster is green again, as the shards
are assigned and then initialized. Is this the best way to restart a bad
node?
3: Can I remove the ability for users to make such intensive requests from
Kibana (either a Kibana setting or an ES setting)?
Thanks
Wilfred
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/895d2c10-e64e-432b-9c3b-f285945d951d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.