We have a cluster that has three indexes (one main, and two supporting) with 5 shards and one replica. I am using 7 Windows nodes, each of which have 8 cores and 56gb RAM. I have set the heap to use 28gb. I have disabled the paging file in the OS and set mlock to true. No matter what I have tried, I am still seeing swap being used. ElasticHQ is reporting between 19 - 32 mb swap being used on the nodes.
We have found that we have far exceeded the recommended shard size of 30 - 50gb. Our shards are currently 70 - 90 gb. We are gearing up to reindex and we will split out to more shards at that time. The swap is an issue we have had since we started, however (i.e. even when the shards will smaller).
We are seeing some perf issues, and I can't help but be concerned about this metric that ElasticHQ is flagging in red. I would like to try to resolve this, but haven't had any luck yet.
mlockall works only on Linux/Unix systems. I think disabling paging file should be enough. Personally, I don't trust the number on ElasticHQ for ES nodes running on Windows.
The info states that it is reporting: stat.os.swap.used_in_bytes / 1024 / 1024
It states that anything greater than 1 is a warning. It matches with what I see when querying in Sense, so I don't think HQ is misrepresenting anything
bootstrap.memory_lock in Elasticsearch 2.x and 5.x uses VirtualLock on Windows to lock a specified region of the process's virtual address space into physical memory.
What does Resource Monitor report for the process, specifically, commit and working set memory?
The first option is to use mlockall on Linux/Unix systems, or VirtualLock on Windows, to try to lock the process address space into RAM, preventing any Elasticsearch memory from being swapped out. Disable swapping | Elasticsearch Guide [8.11] | Elastic
Anyway, on my clusters I chose to disable paging file entirely, but ElasticHQ still reported bug number in swap.
Is this bootstrap.mlockall: true still valid on 2.x?
No, ElasticHQ does not misrepresent anything. It just reports the numbers it gets from GET _nodes/stats, stats.os.swap.used_in_bytes in this case. Before, I've tested both settings in ealier and 2.4.x as mentioned by @forloop
They never work. On my current ES 2.4.0 node, bootstrap.memory_lock: true is set, and paging file is disabled, yet the used swap is still a very big number
Commit is at 30GB, which matches the heap size specified, and page faults/sec is 0, which indicates to me that all of the virtual memory committed is physical memory i.e. RAM.
Working set can be larger than commit as it includes both private and shareable bytes. If you'd like to understand more, you can run VMMap on the process to get a more detailed breakdown.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.