Thanks!
Here's what I've done so far:
I figured out how to limit the number of replicas.
This can be done via templates:
PUT _template/all
{
"template": "*",
"settings": {
"number_of_replicas": 0
}
}
I will be testing it tomorrow if it makes an effect and makes the status green.
I don't think it will do anything performance wise, but we'll see.
I'm working through other suggestions:
- Limited RAM use to 31GB
- File descriptor is already set to 65535
- Maximum number of threads is already set to 4096
- Maximum size virtual memory check is already increased and configured
- Maximum map count bumped to 262144
- G1GC is disabled (by default)
One thing I'm trying is to reduce the:
8-13:-XX:CMSInitiatingOccupancyFraction=75
to
8-13:-XX:CMSInitiatingOccupancyFraction=70
I believe this will speed up garbage collection and will prevent out of ram errors. We'll try to adjust this up/down to see it if helps.
Switch to G1GC
I realize that this not really encouraged, but there are articles about this dealing with similar issues of out of memory where switching to G1GC helped resolve the issue: Garbage Collection in Elasticsearch and the G1GC | by Prabin Meitei M | Naukri Engineering | Medium
This is going to be the last thing I'm going to try.