Elasticsearch 7.8 worse heap management

Carlos_Moya · July 13, 2020, 2:28pm

Hello,

We have an Elasticsearch cluster of 94 servers and last Wednesday we updated it from 7.6.2 to 7.8.0.

On Friday one of our teams reported that a nightly process reduced its performance dramatically. We verified it looking at the metrics:

elasticsearch_indices_search_query_time_seconds (all servers)

search_global1836×238 138 KB
elasticsearch_indices_search_query_time_seconds (filtered to one server)

search_filtered1838×228 35.2 KB
heap usage (all servers)

heap_global1836×241 334 KB
heap usage (filtered to one server)

heap_filtered1833×233 71 KB

What was the change introduced in 7.7 or 7.8? Is there a way to deactivate it?

Thanks in advance.

elasticforme · July 13, 2020, 2:53pm

I agree with it. 7.7 was suppose to reduce the heap usage but it has not infact it clearly uses more

when I upgraded from 7.6 to 7.8 it started using more cluster jvm.

this is one of the cluster stats field that I am monitoring every hour.
and this is the case on all my cluster. after upgrade to 7.7 it started using more memory

xeraa · July 13, 2020, 3:02pm

Are you using the bundled JDK or setting JAVA_HOME explicitly?

https://github.com/elastic/elasticsearch/blob/7.6/buildSrc/version.properties#L5 is where we define the bundled one and while 7.6 used JDK 13, 7.7 and 7.8 switched to 14. That also switched the default garbage collector to G1 from CMS. Not sure this is the main or only reason, but that could have some impact (though I think that we've only seen some negative impact on small heaps which we are looking into at the moment).

PS: Fixed the location of the bundled JDK definition.

elasticforme · July 13, 2020, 3:04pm

using bundled.

java -version
openjdk version "14" 2020-03-17
OpenJDK Runtime Environment AdoptOpenJDK (build 14+36)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 14+36, mixed mode, sharing)

elasticforme · July 13, 2020, 3:05pm

but my config is old I guess?

jvm.options. is this correct?

GC configuration

-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

xeraa · July 13, 2020, 3:08pm

I would need to check the source, but IMO we're shipping the settings for both. So if you set your own JAVA_HOME and use an older JDK version, you can still use CMS with the right settings.

JDK 14 has removed CMS.

elasticforme · July 13, 2020, 3:14pm

oh I see on new jvm.options there are more options for GC.
but I didn't change it as rpm install just kept the old config.\

jvm.options.rpmnew

## G1GC Configuration
# NOTE: G1 GC is only supported on JDK version 10 or later
# to use G1GC, uncomment the next two lines and update the version on the
# following three lines to your version of the JDK
# 10-13:-XX:-UseConcMarkSweepGC
# 10-13:-XX:-UseCMSInitiatingOccupancyOnly
14-:-XX:+UseG1GC
14-:-XX:G1ReservePercent=25
14-:-XX:InitiatingHeapOccupancyPercent=30

xeraa · July 13, 2020, 3:19pm

So what are the options your process is using now? If not the ones we provide that might be an issue.

And we'd need to look into fixing that in the upgrade process.

elasticforme · July 13, 2020, 3:21pm

I just switch to new one. lets wait for few hour to compare jvm usages and will tell you more by tomorrow.

Carlos_Moya · July 13, 2020, 3:22pm

Hi xeraa,

I'm using the bundled JDK :

ps -ax | grep elastic
20561 ?        SLsl 5336:54 /usr/share/elasticsearch/jdk/bin/java -Xshare:auto -Des.networkaddress.cache.ttl=60 -Des.networkaddress.cache.negative.ttl=10 -XX:+AlwaysPreTouch -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -XX:+ShowCodeDetailsInExceptionMessages -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dio.netty.allocator.numDirectArenas=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Djava.locale.providers=SPI,COMPAT -Xms30g -Xmx30g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0 -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -XX:HeapDumpPath=/var/lib/elasticsearch -XX:MaxDirectMemorySize=16106127360 -Des.path.home=/usr/share/elasticsearch -Des.path.conf=/etc/elasticsearch -Des.distribution.flavor=default -Des.distribution.type=deb -Des.bundled_jdk=true -cp /usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -p /var/run/elasticsearch/elasticsearch.pid --quiet

/usr/share/elasticsearch/jdk/bin/java -version
    openjdk version "14.0.1" 2020-04-14
    OpenJDK Runtime Environment AdoptOpenJDK (build 14.0.1+7)
    OpenJDK 64-Bit Server VM AdoptOpenJDK (build 14.0.1+7, mixed mode, sharing)

Current jvm.options file:

/etc/elasticsearch # cat jvm.options
## JVM configuration

################################################################
## IMPORTANT: JVM heap size
################################################################
##
## You should always set the min and max JVM heap
## size to the same value. For example, to set
## the heap to 4 GB, set:
##
## -Xms4g
## -Xmx4g
##
## See https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html
## for more information
##
################################################################

# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space

-Xms30g
-Xmx30g

################################################################
## Expert settings
################################################################
##
## All settings below this section are considered
## expert settings. Don't tamper with them unless
## you understand what you are doing
##
################################################################

## GC configuration
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly

## optimizations

# pre-touch memory pages used by the JVM during initialization
-XX:+AlwaysPreTouch

## basic

# force the server VM
-server

# explicitly set the stack size
-Xss1m

# set to headless, just in case
-Djava.awt.headless=true

# ensure UTF-8 encoding by default (e.g. filenames)
-Dfile.encoding=UTF-8

# use our provided JNA always versus the system one
-Djna.nosys=true

# turn off a JDK optimization that throws away stack traces for common
# exceptions because stack traces are important for debugging
-XX:-OmitStackTraceInFastThrow

# flags to configure Netty
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0

# log4j 2
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true

## heap dumps

# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
# -XX:+HeapDumpOnOutOfMemoryError

# specify an alternative path for heap dumps
# ensure the directory exists and has sufficient space
-XX:HeapDumpPath=/var/lib/elasticsearch

## GC logging

#-XX:+PrintGCDetails
#-XX:+PrintGCTimeStamps
#-XX:+PrintGCDateStamps
#-XX:+PrintClassHistogram
#-XX:+PrintTenuringDistribution
#-XX:+PrintGCApplicationStoppedTime

# log GC status to a file with time stamps
# ensure the directory exists
#-Xloggc:${loggc}

# By default, the GC log file will not rotate.
# By uncommenting the lines below, the GC log file
# will be rotated every 128MB at most 32 times.
#-XX:+UseGCLogFileRotation
#-XX:NumberOfGCLogFiles=32
#-XX:GCLogFileSize=128M

elasticforme · July 13, 2020, 3:35pm

Hi Carlos,
I didn't mean to hijack your thread. but I think we both had same issue.

xeraa · July 13, 2020, 3:50pm

It might not be the only thing here, but the GC options are potentially an issue.

Carlos, I don't think your options are what you want. https://github.com/elastic/elasticsearch/blob/7.8/distribution/src/config/jvm.options#L46-L48 as a reference.

I'm curious if

This changes anything (and we need to open an issue to fix the upgrade process or should louder).
It solves it entirely or is just part of a bigger puzzle.

BTW the docs for JVM options mention that any customizations should go to an environment variable or into /etc/elasticsearch/jvm.options.d/. IMO if you change a config file your package manager will notice that and ask if it should overwrite it or not (which would also overwrite your changes to the heap). Maybe that's how you got stuck on an old version of that file?

elasticforme · July 13, 2020, 3:54pm

Hi xeraa, you might be right but in my case I have zero clue on jvm works.

I have updated all cluster master with proper config and restarted them. I will have hourly cluster_stats dump in to index for tracking how it is used. I will know in few hour if it reduce total cluster heap memory or not.

[2020-07-13T10:49:44,079][INFO ][o.e.n.Node ] [elkm01] JVM home [/usr/share/elasticsearch/jdk]
[2020-07-13T10:49:44,079][INFO ][o.e.n.Node ] [elkm01] JVM arguments [-Xshare:auto, -Des.networkaddress.cache.ttl=60, -Des.networkaddress.cache.negative.ttl=10, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -XX:-OmitStackTraceInFastThrow, -XX:+ShowCodeDetailsInExceptionMessages, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dio.netty.allocator.numDirectArenas=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Djava.locale.providers=SPI,COMPAT, -Xms20g, -Xmx20g, -XX:+UseG1GC, -XX:G1ReservePercent=25, -XX:InitiatingHeapOccupancyPercent=30, -Djava.io.tmpdir=/tmp/elasticsearch-15823551871146500716, -XX:+HeapDumpOnOutOfMemoryError, -XX:HeapDumpPath=/var/lib/elasticsearch, -XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log, -Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m, -XX:MaxDirectMemorySize=10737418240, -Des.path.home=/usr/share/elasticsearch, -Des.path.conf=/etc/elasticsearch, -Des.distribution.flavor=default, -Des.distribution.type=rpm, -Des.bundled_jdk=true]

Carlos_Moya · July 13, 2020, 4:26pm

No worries, it could be the same issue.

Carlos_Moya · July 13, 2020, 4:32pm

Hi Xeraa,

You're right, I was stuck on an old version because I edited it. I updated to the last settings and now I need to wait to see the results.

About using jvm.options.d, as the docs say, it's a better way to add or modify options, but what I need to do if I want to remove one?
I removed the -XX:+HeapDumpOnOutOfMemoryError because I don't want HeapDumps to be generated. I suppose that I can define -XX:HeapDumpPath=/dev/null but it doesn't feel clean.

elasticforme · July 14, 2020, 2:56pm

Hi Xeraa,

After changing jvm.options setting for new java GC it has improve heap usage. it is not drastically reduce it but it is better then before in last 24 hour.

Carlos_Moya · July 15, 2020, 10:24am

Hi Xeraa,

After changing the JVM options to the recommended settings that came with the elasticsearch package it improved the max heap usage, but the search performance isn't improved, it's much worse than the performance in 7.6.2

Here are some metrics:

Heap usage (all servers):

response2_Heap_usage1833×431 585 KB
Heap usage (filtered to one server):

response2_filtered_to_one_server1838×401 104 KB
Heap usage zoomed on the change from 7.6.2 to 7.8.0

response2_heap_from6to81838×406 42.3 KB
Heap usage zoomed on change from 7.8.0 with misconfigured GC to 7.8.0 with the Elasticsearch default GC settings:

response2_from8withouttunning_to8with_tunning1838×405 68.1 KB

Ignacio_Vera · July 16, 2020, 6:19am

Could you share more information about the performance degradation? In particular it would be helpful to see the hot threads when running those queries and a query profile.

Carlos_Moya · July 16, 2020, 3:54pm

Hello @Ignacio_Vera ,

To give you some context and do some summary I will explain deeply our situation.

This is our current cluster overview:

After updating from 7.6.2 to 7.8.0 we noticed a huge performance degradation in elasticsearch, even noticeable by the users in kibana.

For us were easy to measure it, we only need to take a look at the heaviest process that we do with elasticsearch, it is a nightly data extraction done by our Data Analytics team.

Before the update this process was completed in 3h 51min:

After the update it takes around 6h 5m:

Based on the metrics, our first suspicions were worse heap management because we see a great difference in max usage and usage variation.

@Xeraa pointed to us that this elasticsearch version changes the bundled JDK from 13 to 14 and we are using an old jvm.options configuration. After applying the new configuration we recovered the max heap usage saw before but not the usage variation.

And here we are, the nightly data extraction process is a process that we are executing exactly the same way every night for one year. We didn't notice a degradation in 6.8.0, neither in 7.6.2, our issue became with the update to 7.8.0.

The extraction usually gets around 58.325.154 hits and it's done with queries like this:

GET /varys/_search 
{
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "ts": {
              "gte": "2020-07-12T00:00:00",
              "lt": "2020-07-12T01:00:00"
            }
          }
        }
      ],
      "should": [
        {
          "match": {
            "type": "pageview"
          }
        },
        {
          "match": {
            "type": "user_event"
          }
        },
        {
          "match": {
            "type": "session_info"
          }
        }
      ],
      "minimum_should_match": 1
    }
  }
}

I've done a profile from kibana because the original process doesn't allow us to configure it, here it's limited to 10000 hits, the original process isn't. The result, removing the hits is:

EDIT: I reduced it because this message only allows 13.000 characters.

{
  "took" : 310,
  "timed_out" : false,
  "_shards" : {
    "total" : 120,
    "successful" : 120,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : 4.8003182,
    "hits" : [...]
  },
  "profile" : {
    "shards" : [
      {
        "id" : "[-jHLObIoRYyFZTwhrBLNWA][varys-000446][10]",
        "searches" : [
          {
            "query" : [
              {
                "type" : "BooleanQuery",
                "description" : "(+ts:[1594512000000 TO 1594547999999] type:pageview type:user_event type:session_info)~1",
                "time_in_nanos" : 14264690,
                "breakdown" : {
                  "set_min_competitive_score_count" : 0,
                  "match_count" : 12,
                  "shallow_advance_count" : 0,
                  "set_min_competitive_score" : 0,
                  "next_doc" : 431132,
                  "match" : 6065,
                  "next_doc_count" : 12,
                  "score_count" : 12,
                  "compute_max_score_count" : 0,
                  "compute_max_score" : 0,
                  "advance" : 12220986,
                  "advance_count" : 34,
                  "score" : 12446,
                  "build_scorer_count" : 68,
                  "create_weight" : 247447,
                  "shallow_advance" : 0,
                  "create_weight_count" : 1,
                  "build_scorer" : 1346475
                },
                "children" : [
                  {
                    "type" : "IndexOrDocValuesQuery",
                    "description" : "ts:[1594512000000 TO 1594547999999]",
                    "time_in_nanos" : 794246,
                    "breakdown" : {
                      "set_min_competitive_score_count" : 0,
                      "match_count" : 0,
                      "shallow_advance_count" : 0,
                      "set_min_competitive_score" : 0,
                      "next_doc" : 3704,
                      "match" : 0,
                      "next_doc_count" : 12,
                      "score_count" : 12,
                      "compute_max_score_count" : 0,
                      "compute_max_score" : 0,
                      "advance" : 5087,
                      "advance_count" : 56,
                      "score" : 1088,
                      "build_scorer_count" : 102,
                      "create_weight" : 433,
                      "shallow_advance" : 0,
                      "create_weight_count" : 1,
                      "build_scorer" : 783751
                    }
                  },
                  {
                    "type" : "TermQuery",
                    "description" : "type:pageview",
                    "time_in_nanos" : 372810,
                    "breakdown" : {
                      "set_min_competitive_score_count" : 0,
                      "match_count" : 0,
                      "shallow_advance_count" : 136,
                      "set_min_competitive_score" : 0,
                      "next_doc" : 0,
                      "match" : 0,
                      "next_doc_count" : 0,
                      "score_count" : 12,
                      "compute_max_score_count" : 136,
                      "compute_max_score" : 15084,
                      "advance" : 111806,
                      "advance_count" : 68,
                      "score" : 1353,
                      "build_scorer_count" : 102,
                      "create_weight" : 111162,
                      "shallow_advance" : 8404,
                      "create_weight_count" : 1,
                      "build_scorer" : 124546
                    }
                  },
                  {
                    "type" : "TermQuery",
                    "description" : "type:user_event",
                    "time_in_nanos" : 10098149,
                    "breakdown" : {
                      "set_min_competitive_score_count" : 0,
                      "match_count" : 0,
                      "shallow_advance_count" : 126,
                      "set_min_competitive_score" : 0,
                      "next_doc" : 0,
                      "match" : 0,
                      "next_doc_count" : 0,
                      "score_count" : 0,
                      "compute_max_score_count" : 126,
                      "compute_max_score" : 11107,
                      "advance" : 9958321,
                      "advance_count" : 66,
                      "score" : 0,
                      "build_scorer_count" : 98,
                      "create_weight" : 60922,
                      "shallow_advance" : 6379,
                      "create_weight_count" : 1,
                      "build_scorer" : 61003
                    }
                  },
                  {
                    "type" : "TermQuery",
                    "description" : "type:session_info",
                    "time_in_nanos" : 2580168,
                    "breakdown" : {
                      "set_min_competitive_score_count" : 0,
                      "match_count" : 0,
                      "shallow_advance_count" : 132,
                      "set_min_competitive_score" : 0,
                      "next_doc" : 0,
                      "match" : 0,
                      "next_doc_count" : 0,
                      "score_count" : 0,
                      "compute_max_score_count" : 132,
                      "compute_max_score" : 12018,
                      "advance" : 2441590,
                      "advance_count" : 68,
                      "score" : 0,
                      "build_scorer_count" : 102,
                      "create_weight" : 55690,
                      "shallow_advance" : 5968,
                      "create_weight_count" : 1,
                      "build_scorer" : 64467
                    }
                  }
                ]
              }
            ],
            "rewrite_time" : 9313,
            "collector" : [
              {
                "name" : "SimpleTopScoreDocCollector",
                "reason" : "search_top_hits",
                "time_in_nanos" : 33375
              }
            ]
          }
        ],
        "aggregations" : [ ]
      }
    ]
  }
}

As you said, I've also requested the /_nodes/hot_threads while the process is running. I've got the next response (with erased IPs):

::: {fesn18}{PFBPEqxWQt6B8w7Tgocb5w}{G-dN6qhcTWeZLzgG2ZA_gw}{X.X.X.X}{X.X.X.X:9300}{dlrt}{ml.machine_memory=67456950272, ml.max_open_jobs=20, xpack.installed=true, box_type=defaultnodetype, transform.node=true}
   Hot threads at 2020-07-16T12:57:23.861Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

::: {fesn61}{KOrNRROwQhqi9HwhaYUXug}{tETgc45NTdyFznN-s0--9A}{X.X.X.X}{X.X.X.X:9300}{dlrt}{ml.machine_memory=67457081344, ml.max_open_jobs=20, xpack.installed=true, box_type=defaultnodetype, transform.node=true}
   Hot threads at 2020-07-16T12:57:23.861Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

::: {fesn28}{xRbiCMKUTFe84KNaq9p7eQ}{MLjzVmvCQvOqW-xccEA27g}{X.X.X.X}{X.X.X.X:9300}{dlrt}{ml.machine_memory=67456978944, ml.max_open_jobs=20, xpack.installed=true, box_type=defaultnodetype, transform.node=true}
   Hot threads at 2020-07-16T12:57:23.863Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

::: {fesn71}{mLS9UDsjQtSEwNgtVRjXhg}{mPG5R6UhRjykmRycy2c2Cw}{X.X.X.X}{X.X.X.X:9300}{dlrt}{ml.machine_memory=67457142784, ml.max_open_jobs=20, xpack.installed=true, box_type=defaultnodetype, transform.node=true}
   Hot threads at 2020-07-16T12:57:23.863Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

    3.9% (19.4ms out of 500ms) cpu usage by thread 'elasticsearch[fesn71][write][T#6]'
     8/10 snapshots sharing following 2 elements
       java.base@14.0.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
       java.base@14.0.1/java.lang.Thread.run(Thread.java:832)

::: {fesn52}{KYqnV8x1R5mf-AJwD1wXJg}{NnI7_zJoTFaViDK5ShaGgw}{X.X.X.X}{X.X.X.X:9300}{dlrt}{ml.machine_memory=67456999424, ml.max_open_jobs=20, xpack.installed=true, box_type=defaultnodetype, transform.node=true}
   Hot threads at 2020-07-16T12:57:23.862Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

::: {fesn36}{a_BLI3CETt-t21_FV6XxFw}{uLz-XsN1SS-XAxwiEU8irg}{X.X.X.X}{X.X.X.X:9300}{dlrt}{ml.machine_memory=67456991232, ml.max_open_jobs=20, xpack.installed=true, box_type=defaultnodetype, transform.node=true}
   Hot threads at 2020-07-16T12:57:23.862Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

::: {master1}{LW5sVKysQVay2aiBo09txA}{jiT3nOL0QfuDC9XQkFl5AQ}{X.X.X.X}{X.X.X.X:9300}{lmr}{ml.machine_memory=67326771200, ml.max_open_jobs=20, xpack.installed=true, box_type=master, transform.node=false}
   Hot threads at 2020-07-16T12:57:23.864Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

::: {fesn31}{9H4TJDvLQqa1RCNcQgXlDQ}{5FhpCYM-RvikgO9ZdVJW5Q}{X.X.X.X}{X.X.X.X:9300}{dlrt}{ml.machine_memory=67456978944, ml.max_open_jobs=20, xpack.installed=true, box_type=defaultnodetype, transform.node=true}
   Hot threads at 2020-07-16T12:57:23.863Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

   14.3% (71.6ms out of 500ms) cpu usage by thread 'elasticsearch[fesn31][management][T#3]'
     10/10 snapshots sharing following 2 elements
       java.base@14.0.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)
       java.base@14.0.1/java.lang.Thread.run(Thread.java:832)

::: {fesn33}{3eS__3kIS6CMoy536JiUWw}{J51wh8aVR6C84IyyxNzHGw}{X.X.X.X}{X.X.X.X:9300}{dlrt}{ml.machine_memory=67456978944, ml.max_open_jobs=20, xpack.installed=true, box_type=defaultnodetype, transform.node=true}
   Hot threads at 2020-07-16T12:57:23.863Z, interval=500ms, busiestThreads=3, ignoreIdleThreads=true:

    7.3% (36.5ms out of 500ms) cpu usage by thread 'elasticsearch[fesn33][write][T#2]'
     8/10 snapshots sharing following 2 elements
       java.base@14.0.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)

I want to point that the performance degradation is global, I'm speaking about this nightly process because is where it's easier to see it.

Thanks in advance,
Carlos

Ignacio_Vera · July 20, 2020, 9:30am

Thanks for the detail explanation. Unfortunately those hot threads do not contain any useful information. It should be run while the cluster is under pressure.

Another interesting metric to check is Major pagefaults, have you seen an increase between versions?

Topic		Replies	Views
Garbage collection Elasticsearch	13	8307	July 6, 2017
Memory problems Elasticsearch	27	1328	July 6, 2017
Sudden Unexplained CPU Usage Elasticsearch	17	462	July 6, 2017
Regarding upgrading Elastic Search server from 0.18.3 Elasticsearch	8	365	July 6, 2017
Es got blocked Elasticsearch	13	925	July 6, 2017

Elasticsearch 7.8 worse heap management

GC configuration

Related topics