Hey guys,
We're facing some problems with instances going down from time to time with "exiting
java.lang.OutOfMemoryError: Java heap space".
Currently:
- Elasticsearch 7.1.1
- Java 11
- G1GC and default jvm.options configs
- 2.5b docs.
- 1 index with 40 shards (20 p + 20 r)
- 2.2TB in primary data.
- 12 x 32gb ram instance with 16gb allocated to Elasticsearch (50%)
- cluster in AWS
In logs, we can also see this logs from GC:
[gc][2072844] overhead, spent [6s] collecting in the last [6.1s]
[old][2072844][25] duration [6s], collections [1]/[6.1s], total [6s]/[2.2m], memory [15.3gb]->[15gb]/[16gb], all_pools {[young] [8mb]->[0b]/[0b]}{[old] [15.3gb]->[15gb]/[16gb]}{[survivor] [0b]->[0b]/[0b]}
When the instance go with "OutOfMemoryError: Java heap space" and stop the service, I see that the GC count and time increased for that instance
Also, we run some tasks every day that generates stats from docs, which can take up to the last 30 days of data. When this task runs, sometimes the same problems occur, but not always.
Do you guys have any clue in where I can go further and debug this?