We have an elasticsearch cluster running in production with the below
configuration. Our cluster has a master-data node topology with 8 master
nodes serving traffic. The issue we are facing very recently is that
garbage collector is running more frequently and also taking more time as
well(than usual) for the past few days. Total garbage collection on all
master nodes takes a maximum of around 250s to complete. Search requests
have increased a bit of late and we are still using nio fs as the index
store
type(http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/index-modules-store.html
). This is affecting our latencies and ultimately requests get dropped once
the cluster is not able to handle the traffic. We have a queue on top of
incoming requests and is constantly at ~10. Most of our requests are either
get/bulkUpdate/search.
Configuration:
"number_of_nodes": 17,
"number_of_data_nodes": 9,
"active_primary_shards": 24730,
"active_shards": 74196,
Elasticsearch version: 0.90.11
Java: 1.7u55
Garbage collector: G1
We have one index per user and one shard+one replica for each index.
Current number of indices is 24730 and we expect it to go as much as 45000.
We created around 10k indices in the last month. And we just upgraded to
java 1.7u55 from 1.7u51.
Our master nodes have 64gb of RAM and elasticsearch is using half of it.
So how does search requests affect heap size and garbage collection? We do
not have any cross index search requests. What can be the possible reasons
for GC taking so much time in the past few days (even before the java
upgrade as well)? Can we use mmapfs so that less heap size gets used and
hence GC could run faster?
We have an elasticsearch cluster running in production with the below
configuration. Our cluster has a master-data node topology with 8 master
nodes serving traffic. The issue we are facing very recently is that
garbage collector is running more frequently and also taking more time as
well(than usual) for the past few days. Total garbage collection on all
master nodes takes a maximum of around 250s to complete. Search requests
have increased a bit of late and we are still using nio fs as the index
store type( Elasticsearch Platform — Find real-time answers at scale | Elastic). This is affecting our latencies and ultimately requests get dropped once
the cluster is not able to handle the traffic. We have a queue on top of
incoming requests and is constantly at ~10. Most of our requests are either
get/bulkUpdate/search.
Configuration:
"number_of_nodes": 17,
"number_of_data_nodes": 9,
"active_primary_shards": 24730,
"active_shards": 74196,
Elasticsearch version: 0.90.11
Java: 1.7u55
Garbage collector: G1
We have one index per user and one shard+one replica for each index.
Current number of indices is 24730 and we expect it to go as much as 45000.
We created around 10k indices in the last month. And we just upgraded to
java 1.7u55 from 1.7u51.
Our master nodes have 64gb of RAM and elasticsearch is using half of it.
So how does search requests affect heap size and garbage collection? We do
not have any cross index search requests. What can be the possible reasons
for GC taking so much time in the past few days (even before the java
upgrade as well)? Can we use mmapfs so that less heap size gets used and
hence GC could run faster?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.