Heap JVM memory leak in ElasticSearch 2.4.5

Hi all, I use ElasticSearch 2.4.5
I attache image heap-memory ElasticSearch.

In the end I need to restart my server and it started again.

I have 3 node in my claster, but one of them looks different from other:

Memory still leak and I don't know what to do.

One interesting things is scripts cache:

curl -XGET 'http://ip:9200/_nodes/stats'

  "script": {
    "compilations": 68717,
    "cache_evictions": 68617
  }

  "script": {
    "compilations": 3945,
    "cache_evictions": 3845
  }

  "script": {
    "compilations": 70015,
    "cache_evictions": 69915
  }

Whats wrong?

What kind of script is this? I have seen this before with groovy scripts that were changed over and over again. If this is the case, could you try moving the parts of the script that are changing into parameters so the actual script text would always remain the same?

You should really upgrade, 2.4 has been end of life for nearly a year now - Elastic product end of life dates | Elastic

We working on it. But untill we will upgrade I want to fix this problem :slight_smile:

I think Igor was referring to this issue earlier. Have a look at it and see if this applies to how you are using scripts.

Thanks, will try.

Can I manually clear script cache without server restart?

And why it not growing only on one node?

  "script": {
    "compilations": 98476,
    "cache_evictions": 98376
  }
  "script": {
    "compilations": 3945,
    "cache_evictions": 3845
  }
  "script": {
    "compilations": 99784,
    "cache_evictions": 99684
  }

We have 2 types queries with groovy scripts:

$script = new \Elastica\Script('ctx._source.prefix_name = prefix_name;ctx._source.match_name = match_name;', [], 'groovy');
$script->setParam('prefix_name', strtolower(Transliterator::ruToEn($data['name'])));
$script->setParam('match_name', $data['name']);
$client->updateDocument($itemId, $script, self::INDEX_NAME, self::INDEX_TYPE);

And:

$query->addSort(['_script' => [
    'script' => "if (_source.containsKey('other_properties')) {
        for (item in _source.other_properties) {
            if (item.v_label == '".addslashes($row['v_label'])."' && item.k == ".self::VENDOR_PROPERTY_ID.") {
                return 10;
            }
       }}
       return 1;",
       'type'   => 'number',
       'order'  => 'desc'
]]);

What you mean udner:

try moving the parts of the script that are changing into parameters so the actual script text would always remain the same?

Not adding dynamic parameters in script section, instead of this using parameters block?
Like in first query example.

If it is what I think it is, the problem is not in the cache, the problem is in compiled scripts that were already evicted from the cache and in the process they leaked memory. Cleaning the cache is not going to fix that.

Exactly! This way the script will never change (only parameters will), it will only compile once and will be kept in cache forever. It should increase the performance as well since script compilation is pretty heavy process.

Thanks, we will try to fix it.

I have one instance from cluster, where cache_evictions in scripts does not growing.

"script": {
    "compilations": 3945,
    "cache_evictions": 3845
  }

But memory still leak.
What I need to check in that case?

First I would check other stats, if they don't grow - analyze heapdump.

Thanks, it helps! cache_evictions stop growing.
But unfortunately memory still growing.

Can you please help me.
All the params are growing :slight_smile:
Which one I need to look first?

Nodes stats after restart:

Nodes stats after 16 hours work:

I don't see anything particularly wrong in the stats that you sent me. It is normal for heap to grow to a certain degree. The problem is when it grows above 80% and stays there. I see only 28% used on one of the nodes, which doesn't indicate an issue.

I think it really helps.

This is ElasticSearch heap for the last 7 days.

Thanks a lot!

But still don't know why one node looks different from another.

This is ElasticSearch heap for the last 7 days.

It's possible that nodes have different load. For example if you are running a lot of update operations then the update script will be only executed on the primary shard. If you are only connecting to one node with your client and retrieving a lot of data the load on that node might be higher as well. There could be many reasons for the difference in the behavior. We need to see how shards are allocated between nodes and what roles the nodes play to say for sure.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.