Can I configure ES cache auto expire for prevent running GC?

Hello, We using ES for logging system.
When ES receive many search request, ES bulk throughput performance is down heavily.
Many GC perform at that time.
We think like this.

  1. ES nodes perform searching request and
  2. try to upload data to cache,
  3. it fire GC,
  4. So throughput performance goes down.

Our monitor members is finding configuration that auto cache expiring before perform GC.
So I found this links for cache configuration.

https://www.elastic.co/guide/en/elasticsearch/reference/5.1/indices-clearcache.html
https://www.elastic.co/guide/en/elasticsearch/reference/5.1/shard-request-cache.html
https://www.elastic.co/guide/en/elasticsearch/reference/5.1/query-cache.html

Is any other configurations? If exist, I hope to get that links.

our systems:

  • Master - 3 nodes, 10G heap (no data)
  • Data - 2 nodes per server. 9 servers (18 nodes), 30G heap each node.
  • ES version : 5.2.2

How many shards you have in total?
How many indices? Which size per shard?
What are the typical queries you are running?

In addition to what David asked for, it would also be useful to know what indexing throughput you are seeing and what your target is. As document size and complexity affects this, it would also be great if you could give us an indication of the average document size.

If you can provide the full output if the cluster stats API, it will provide some information about what the heap is used for.

Hope to help..

total shards : 9097 (primary 4566)
size per shard : 5690.9 MB
indices : 123

we use term query almost.

This is our cluster stats at normal. sadly, we doesn't have stats data at GC time.

{
"_nodes": {
"total": 24,
"successful": 24,
"failed": 0
},
"cluster_name": "logger-14",
"timestamp": 1525416070781,
"status": "green",
"indices": {
"count": 123,
"shards": {
"total": 9097,
"primaries": 4566,
"replication": 0.9923346473937801,
"index": {
"shards": {
"min": 2,
"max": 80,
"avg": 73.95934959349593
},
"primaries": {
"min": 1,
"max": 40,
"avg": 37.1219512195122
},
"replication": {
"min": 0,
"max": 1,
"avg": 0.943089430894309
}
}
},
"docs": {
"count": 109095920638,
"deleted": 7
},
"store": {
"size": "43.7tb",
"size_in_bytes": 48055724890759,
"throttle_time": "0s",
"throttle_time_in_millis": 0
},
"fielddata": {
"memory_size": "263.8gb",
"memory_size_in_bytes": 283312294260,
"evictions": 1680529
},
"query_cache": {
"memory_size": "414.9mb",
"memory_size_in_bytes": 435069480,
"total_count": 25785331,
"hit_count": 1352804,
"miss_count": 24432527,
"cache_size": 3274,
"cache_count": 914118,
"evictions": 910844
},
"completion": {
"size": "0b",
"size_in_bytes": 0
},
"segments": {
"count": 253007,
"memory": "129.1gb",
"memory_in_bytes": 13678872246,
"terms_memory": "117.3gb",
"terms_memory_in_bytes": 125966381048,
"stored_fields_memory": "10.3gb",
"stored_fields_memory_in_bytes": 11089416600,
"term_vectors_memory": "0b",
"term_vectors_memory_in_bytes": 0,
"norms_memory": "15.7mb",
"norms_memory_in_bytes": 16559680,
"points_memory": "1.2gb",
"points_memory_in_bytes": 1342464690,
"doc_values_memory": "251.8mb",
"doc_values_memory_in_bytes": 264050228,
"index_writer_memory": "226.3mb",
"index_writer_memory_in_bytes": 237392652,
"version_map_memory": "15.1mb",
"version_map_memory_in_bytes": 15906164,
"fixed_bit_set": "0b",
"fixed_bit_set_memory_in_bytes": 0,
"max_unsafe_auto_id_timestamp": 1525306201279,
"file_sizes": {}
}
},
"nodes": {
"count": {
"total": 24,
"data": 18,
"coordinating_only": 0,
"master": 3,
"ingest": 24
},
"versions": [
"5.2.2"
],
"os": {
"available_processors": 816,
"allocated_processors": 744,
"names": [
{
"name": "Linux",
"count": 24
}
],
"mem": {
"total": "2.9tb",
"total_in_bytes": 3245305462784,
"free": "208.4gb",
"free_in_byted": 223864872960,
"used": "2.7tb",
"used_in_bytes": 3021440589824,
"free_percent": 7,
"used_percent": 93
}
},
"process": {
"cpu": {
"percent": 207
},
"open_file_descriptors": {
"min": 1167,
"amx": 2850,
"avg": 2360
}
},
"jvm": {
"max_uptime": "317.1d",
"max_uptime_in_millis": 27400448492,
"versions": [
{
"versions": " 1.8.0_121",
"vm_name": "Java HotSpot(TM) 64-Bit Server VM",
"vm_version": " 25.121-b13",
"vm_vendor": " Oracle Corporation",
"count": 24
}
],
"mem": {
"heap_used": "498.5gb",
"heap_used_in_bytes": 535294478536,
"heap_max": "655.3g",
"heap_max_in_bytes": 703696994304
},
"threads": 5929
},
"fs": {
"total": "49.3tb",
"total_in_bytes": 54285195980800,
"free": "27.4tb",
"free_in_bytes": 30178616180736,
"available": "24.9tb",
"available_in_bytes": 27420759195648,
"spins": "true"
},
"plugins": [],
"network_types": {
"transport_types": {
"netty4": 24
},
"http_types": {
"netty4": 24
}
}
}
}

Looking at the stats, it seems like you are using about 7GB heap per node for segments, and 15GB for fielddata. That is a total of about 22GB per data node (73% of heap).

Your average shard size is only around 5GB, so it could be worthwhile using the shrink index API to reduce the shard count. I would also recommend running a force merge (with max_num_segments set to 1) on indices no longer written to in order to reduce the number of segments (you have over 253k segments) and related overhead. This is however a very IO intensive operation, so should be done during off-peak hours.

You could also look at your mappings and see if you can reduce the amount of fielddata memory you use.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.