Can I configure ES cache auto expire for prevent running GC?

GHShin · May 4, 2018, 4:12am

Hello, We using ES for logging system.
When ES receive many search request, ES bulk throughput performance is down heavily.
Many GC perform at that time.
We think like this.

ES nodes perform searching request and
try to upload data to cache,
it fire GC,
So throughput performance goes down.

Our monitor members is finding configuration that auto cache expiring before perform GC.
So I found this links for cache configuration.

https://www.elastic.co/guide/en/elasticsearch/reference/5.1/indices-clearcache.html
https://www.elastic.co/guide/en/elasticsearch/reference/5.1/shard-request-cache.html
https://www.elastic.co/guide/en/elasticsearch/reference/5.1/query-cache.html

Is any other configurations? If exist, I hope to get that links.

our systems:

Master - 3 nodes, 10G heap (no data)
Data - 2 nodes per server. 9 servers (18 nodes), 30G heap each node.
ES version : 5.2.2

dadoonet · May 4, 2018, 5:31am

How many shards you have in total?
How many indices? Which size per shard?
What are the typical queries you are running?

Christian_Dahlqvist · May 4, 2018, 5:34am

In addition to what David asked for, it would also be useful to know what indexing throughput you are seeing and what your target is. As document size and complexity affects this, it would also be great if you could give us an indication of the average document size.

If you can provide the full output if the cluster stats API, it will provide some information about what the heap is used for.

GHShin · May 4, 2018, 7:39am

Hope to help..

total shards : 9097 (primary 4566)
size per shard : 5690.9 MB
indices : 123

we use term query almost.

GHShin · May 4, 2018, 7:41am

This is our cluster stats at normal. sadly, we doesn't have stats data at GC time.

{
"_nodes": {
"total": 24,
"successful": 24,
"failed": 0
},
"cluster_name": "logger-14",
"timestamp": 1525416070781,
"status": "green",
"indices": {
"count": 123,
"shards": {
"total": 9097,
"primaries": 4566,
"replication": 0.9923346473937801,
"index": {
"shards": {
"min": 2,
"max": 80,
"avg": 73.95934959349593
},
"primaries": {
"min": 1,
"max": 40,
"avg": 37.1219512195122
},
"replication": {
"min": 0,
"max": 1,
"avg": 0.943089430894309
}
}
},
"docs": {
"count": 109095920638,
"deleted": 7
},
"store": {
"size": "43.7tb",
"size_in_bytes": 48055724890759,
"throttle_time": "0s",
"throttle_time_in_millis": 0
},
"fielddata": {
"memory_size": "263.8gb",
"memory_size_in_bytes": 283312294260,
"evictions": 1680529
},
"query_cache": {
"memory_size": "414.9mb",
"memory_size_in_bytes": 435069480,
"total_count": 25785331,
"hit_count": 1352804,
"miss_count": 24432527,
"cache_size": 3274,
"cache_count": 914118,
"evictions": 910844
},
"completion": {
"size": "0b",
"size_in_bytes": 0
},
"segments": {
"count": 253007,
"memory": "129.1gb",
"memory_in_bytes": 13678872246,
"terms_memory": "117.3gb",
"terms_memory_in_bytes": 125966381048,
"stored_fields_memory": "10.3gb",
"stored_fields_memory_in_bytes": 11089416600,
"term_vectors_memory": "0b",
"term_vectors_memory_in_bytes": 0,
"norms_memory": "15.7mb",
"norms_memory_in_bytes": 16559680,
"points_memory": "1.2gb",
"points_memory_in_bytes": 1342464690,
"doc_values_memory": "251.8mb",
"doc_values_memory_in_bytes": 264050228,
"index_writer_memory": "226.3mb",
"index_writer_memory_in_bytes": 237392652,
"version_map_memory": "15.1mb",
"version_map_memory_in_bytes": 15906164,
"fixed_bit_set": "0b",
"fixed_bit_set_memory_in_bytes": 0,
"max_unsafe_auto_id_timestamp": 1525306201279,
"file_sizes": {}
}
},
"nodes": {
"count": {
"total": 24,
"data": 18,
"coordinating_only": 0,
"master": 3,
"ingest": 24
},
"versions": [
"5.2.2"
],
"os": {
"available_processors": 816,
"allocated_processors": 744,
"names": [
{
"name": "Linux",
"count": 24
}
],
"mem": {
"total": "2.9tb",
"total_in_bytes": 3245305462784,
"free": "208.4gb",
"free_in_byted": 223864872960,
"used": "2.7tb",
"used_in_bytes": 3021440589824,
"free_percent": 7,
"used_percent": 93
}
},
"process": {
"cpu": {
"percent": 207
},
"open_file_descriptors": {
"min": 1167,
"amx": 2850,
"avg": 2360
}
},
"jvm": {
"max_uptime": "317.1d",
"max_uptime_in_millis": 27400448492,
"versions": [
{
"versions": " 1.8.0_121",
"vm_name": "Java HotSpot(TM) 64-Bit Server VM",
"vm_version": " 25.121-b13",
"vm_vendor": " Oracle Corporation",
"count": 24
}
],
"mem": {
"heap_used": "498.5gb",
"heap_used_in_bytes": 535294478536,
"heap_max": "655.3g",
"heap_max_in_bytes": 703696994304
},
"threads": 5929
},
"fs": {
"total": "49.3tb",
"total_in_bytes": 54285195980800,
"free": "27.4tb",
"free_in_bytes": 30178616180736,
"available": "24.9tb",
"available_in_bytes": 27420759195648,
"spins": "true"
},
"plugins": [],
"network_types": {
"transport_types": {
"netty4": 24
},
"http_types": {
"netty4": 24
}
}
}
}

Christian_Dahlqvist · May 4, 2018, 7:57am

Looking at the stats, it seems like you are using about 7GB heap per node for segments, and 15GB for fielddata. That is a total of about 22GB per data node (73% of heap).

Your average shard size is only around 5GB, so it could be worthwhile using the shrink index API to reduce the shard count. I would also recommend running a force merge (with max_num_segments set to 1) on indices no longer written to in order to reduce the number of segments (you have over 253k segments) and related overhead. This is however a very IO intensive operation, so should be done during off-peak hours.

You could also look at your mappings and see if you can reduce the amount of fielddata memory you use.

system · June 1, 2018, 7:57am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to prevent ES from garbage collecting during several minutes Elasticsearch	7	2037	July 6, 2017
Gc takes a lot of time Elasticsearch	14	6333	February 5, 2018
Preventing stop-of-the-world garbage collection Elasticsearch	7	2980	July 6, 2017
GC failing to reduce heap memory usage Elasticsearch	10	815	July 6, 2017
ElasticSearch gc performance on cluster Elasticsearch	3	680	July 5, 2017

Can I configure ES cache auto expire for prevent running GC?

Related topics