Garbage collection and stop-of-the-world

daniel_Marq · January 10, 2020, 12:39pm

Hi,

We are using AWS Elasticsearch and we are having problems in our cluster when the JVM MemoryPressure reaches 75% in each of our nodes, at that moment a stop the world seems to happen and the nodes cause some unavailability for a few minutes.

We are not very sure because this is happening we have tried to increase the size of the cluster but we have only managed to delay the fall

Now we have 5 data nodes and 3 master nodes in c5.large instances and we have little data but many calls.

Cluster stats:

{
"_nodes": {
"total": 8,
"successful": 8,
"failed": 0
},
"cluster_name": "647845116050:images-live",
"timestamp": 1578658943560,
"status": "green",
"indices": {
"count": 2,
"shards": {
"total": 12,
"primaries": 6,
"replication": 1.0,
"index": {
"shards": {
"min": 2,
"max": 10,
"avg": 6.0
},
"primaries": {
"min": 1,
"max": 5,
"avg": 3.0
},
"replication": {
"min": 1.0,
"max": 1.0,
"avg": 1.0
}
}
},
"docs": {
"count": 2667,
"deleted": 0
},
"store": {
"size": "2.5mb",
"size_in_bytes": 2724052,
"throttle_time": "0s",
"throttle_time_in_millis": 0
},
"fielddata": {
"memory_size": "0b",
"memory_size_in_bytes": 0,
"evictions": 0
},
"query_cache": {
"memory_size": "0b",
"memory_size_in_bytes": 0,
"total_count": 0,
"hit_count": 0,
"miss_count": 0,
"cache_size": 0,
"cache_count": 0,
"evictions": 0
},
"completion": {
"size": "0b",
"size_in_bytes": 0
},
"segments": {
"count": 62,
"memory": "297.6kb",
"memory_in_bytes": 304809,
"terms_memory": "209kb",
"terms_memory_in_bytes": 214101,
"stored_fields_memory": "18.8kb",
"stored_fields_memory_in_bytes": 19312,
"term_vectors_memory": "0b",
"term_vectors_memory_in_bytes": 0,
"norms_memory": "18.7kb",
"norms_memory_in_bytes": 19200,
"points_memory": "180b",
"points_memory_in_bytes": 180,
"doc_values_memory": "50.7kb",
"doc_values_memory_in_bytes": 52016,
"index_writer_memory": "0b",
"index_writer_memory_in_bytes": 0,
"version_map_memory": "0b",
"version_map_memory_in_bytes": 0,
"fixed_bit_set": "0b",
"fixed_bit_set_memory_in_bytes": 0,
"max_unsafe_auto_id_timestamp": -1,
"file_sizes": {}
}
},
"nodes": {
"count": {
"total": 8,
"data": 5,
"coordinating_only": 0,
"master": 3,
"ingest": 5
},
"versions": [
"5.5.2"
],
"os": {
"available_processors": 16,
"allocated_processors": 16,
"names": [
{
"count": 8
}
],
"mem": {
"total": "28.9gb",
"total_in_bytes": 31124226048,
"free": "1.3gb",
"free_in_bytes": 1493733376,
"used": "27.5gb",
"used_in_bytes": 29630492672,
"free_percent": 5,
"used_percent": 95
}
},
"process": {
"cpu": {
"percent": 5
},
"open_file_descriptors": {
"min": 791,
"max": 4069,
"avg": 1686
}
},
"jvm": {
"max_uptime": "23.2d",
"max_uptime_in_millis": 2012217486,
"mem": {
"heap_used": "4.9gb",
"heap_used_in_bytes": 5342301528,
"heap_max": "15.8gb",
"heap_max_in_bytes": 17040408576
},
"threads": 943
},
"fs": {
"total": "69.5gb",
"total_in_bytes": 74700369920,
"free": "62.8gb",
"free_in_bytes": 67496964096,
"available": "60.2gb",
"available_in_bytes": 64678391808
},
"network_types": {
"transport_types": {
"netty4": 8
},
"http_types": {
"filter-jetty": 8
}
}
}
}

Any idea why it happens? and how to fix it?

spinscale · January 10, 2020, 2:20pm

you might want to write a little bit more about your cluster, like usage patterns, number of shards, types of queries, any actions that are taken when this happens. Why is your cluster gathering all the memory, what is happening then, etc.

For example: It seems you only have 2.5MB of data in your system, which is not a lot. Is this true?

It seems however that you are using a not supported plugin named filter-jetty for your HTTP server. You may want to try with any unofficial plugin disabled and see if that works as expected.

daniel_Marq · January 10, 2020, 4:03pm

Thanks for the reply

We have a one index with several fields that can be filtered in the queries. The index is set with 1 replica and 5 shards, the default values for version 5.5 that we use.

And yes, we do not have much data, only about 2.6k documents and the updates in the data are very few, maybe 100 Puts per week.
We have many queries about these documents, about 1 million queries per day, having a lot of traffic in the mornings and little in the nights.

An example of these queries:

{
"query":{
"bool":{
"must":[
{
"match":{
"field":"text_value"
}
},
{
"match":{
"field2":boolean
}
},
{
"match":{
"field3":"text_value"
}
},
{
"range":{
"fieldDate":{
"format":"yyyy-MM-dd",
"lte":"2016-09-09"
}
}
}
]
}
},
"sort":{
"fieldDate":{
"order":"desc"
}
}
}

Queries are always like this and the latencies of these queries are pretty good and we have no performance problems until that the instances reach 75% of jvmMemoryPressure and stop answering a few minutes.

I don't know about the filter-jetty plugin is there, maybe it's some default configuration, I'll try to investigate.

if any more information is necessary please tell me

system · February 7, 2020, 4:12pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ES JVM pressure spike and garbage collection issue Elasticsearch	9	3327	December 27, 2017
Preventing stop-of-the-world garbage collection Elasticsearch	7	2829	July 6, 2017
Elasticsearch - Garbage Collection Issues Elasticsearch	6	1894	February 27, 2019
Heap usage holds steady at max and GC does not run. Need to force restart the cluster Elasticsearch elastic-stack-monitoring	5	558	July 6, 2023
Master node not garbage collecting Elasticsearch	2	545	July 5, 2017

Garbage collection and stop-of-the-world

Related topics