Elastic search using more RAM

Hi ,

I am using a 26 GB Ram machine . And I have allocated 8gb as heap size .

I have some data in mongodb and am doing a bulk api request to put the data from mongo to elastic search .

When I do this operation , I see that elastic search is using up most of the RAM and when I check the cluster status it has used up -
indent preformatted text by 4 spaces

"nodes": {
"count": {
"total": 1,
"data": 1,
"coordinating_only": 0,
"master": 1,
"ingest": 1
},
"versions": [
"6.2.4"
],
"os": {
"available_processors": 4,
"allocated_processors": 4,
"names": [
{
"name": "Linux",
"count": 1
}
],
"mem": {
"total": "25.5gb",
"total_in_bytes": 27389636608,
"free": "5.6gb",
"free_in_bytes": 6097088512,
"used": "19.8gb",
"used_in_bytes": 21292548096,
"free_percent": 22,
"used_percent": 78
}
},
"process": {
"cpu": {
"percent": 0
},
"open_file_descriptors": {
"min": 263,
"max": 263,
"avg": 263
}
},
"jvm": {
"max_uptime": "15h",
"max_uptime_in_millis": 54185581,
"versions": [
{
"version": "1.8.0_151",
"vm_name": "OpenJDK 64-Bit Server VM",
"vm_version": "25.151-b12",
"vm_vendor": "Oracle Corporation",
"count": 1
}
],
"mem": {
"heap_used": "2.4gb",
"heap_used_in_bytes": 2661901360,
"heap_max": "7.9gb",
"heap_max_in_bytes": 8555069440
},
"threads": 56
},
"fs": {
"total": "145.3gb",
"total_in_bytes": 156067389440,
"free": "134.8gb",
"free_in_bytes": 144787374080,
"available": "134.8gb",
"available_in_bytes": 144770596864
},
indent preformatted text by 4 spaces

And I dont see the RAM memory getting freed at all .

Is there any configuration am missing ?

I have set bootstrap.mlockall: true
LimitMEMLOCK=infinity,

I was hoping the RAM memory would get freed up once the bulk operation is compeleted .But thats not the case here .

Please help on this .

Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.

Or use markdown style like:

```
CODE
```

This is the icon to use if you are not using markdown format:

There's a live preview panel for exactly this reasons.

Lots of people read these forums, and many of them will simply skip over a post that is difficult to read, because it's just too large an investment of their time to try and follow a wall of badly formatted text.
If your goal is to get an answer to your questions, it's in your interest to make it as easy to read and understand as possible.
Please update your post.

Coming back to your question, you need to understand how all that works. Actually, indices are stored on disk on the FileSystem. The OS has a FileSystem cache which is used a lot in this context. That's probably why you are seeing that.
That's why we recommend:

  • having only elasticsearch service running on a machine
  • do not set more than half of the memory to the heap size
  • do not set more than 30gb of memory to the heap size

Here, 26gb of RAM for 8gb of HEAP sounds good to me as long as your HEAP is not under pressure, in which case you can increase the HEAP up to 13gb or start new nodes.

My 2 cents

So is my understanding clear ?. To avoid usage of more amount of RAM , we can increase the heap size . But what I see is out of alloted 8 GB of heap space , only 2.7 gb has been used up .

Also , the RAM memory which is used up during bulk operation is not getting released .

No. Keep the HEAP size as it is now, 8gb.

What is the problem you want to solve here? RAM not released is not a problem IMO.

If I overload the server with more bulk queries , its eating up almost all the RAM memory .

But once the cpu becomes free / idle , the RAM is not getting freed up . Is it expected ?

the RAM is not getting freed up

Is it a problem?

Is it expected ?

Well, it could be. If you are querying the dataset at some point I guess this can happen.

I think you are confusing a few things. This part here is for the system where Elasticsearch is running, not the Elasticsearch process itself...

"mem": {
  "total": "25.5gb",
  "total_in_bytes": 27389636608,
  "free": "5.6gb",
  "free_in_bytes": 6097088512,
  "used": "19.8gb",
  "used_in_bytes": 21292548096,
  "free_percent": 22,
  "used_percent": 78
}

As Linux (and most other operating systems) read and write data to disk, these pages will be written to the page cache. Subsequent access of the same data can be read from RAM instead of hitting the disk, greatly improving read performance. Since Lucene is file based - a segment is basically a file - increasing the amount of RAM in a system to make more room for page caching is one of the easiest ways to improve query performance.

If you run top you will see something like this.

KiB Mem : 32859928 total,  1537868 free, 19942092 used, 11379968 buff/cache
KiB Swap:  2097148 total,  2097148 free,        0 used. 12288164 avail Mem

On this 32GB Elasticsearch machine, I only have 1.5GB "free" and about 11.3GB for buffers and cache. Notice how available memory is 12.2GB, which is close to the sum of "free" and "buff/cache". What this is showing is that a lot of data is available in the page cache. However if a process needs that memory, it is available. So if I was to start a process that needed 4GB, the OS would clear room for it by flushing data out of the page cache.

So just remember there is a difference between "free" and "available", or as I like to think about it "used" and "committed" (to processes).

My experience is that for time-series use-cases (e.g. metrics and logs) you are better off using less than the recommended 50% of RAM for ES JVM heap, making more available for the page cache.

3 Likes

Thanks a lot for the explanation . It was very helpful :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.