OutOfMemoryError on adequately sized cluster


(None) #1

Hi, running 6.2.4

Cluster topology:

  • 3 masters
  • 2 ingest nodes
  • 1 Coordinator
  • 3 Data nodes

The masters and the ingest nodes borked with OutOfMemoryError, the data nodes seemed to have survived.

Master (each): 16GB of which ES_JAVA_OPTS="-Xms7g -Xmx7g"
Ingest (each): 4GB of which ES_JAVA_OPTS="-Xms3g -Xmx3g"
Coordinator (each): 64GB of which ES_JAVA_OPTS="-Xms30g -Xmx30g"
Data (each): 64GB of which ES_JAVA_OPTS="-Xms30g -Xmx30g"

All nodes running ubuntu JDK
openjdk version "1.8.0_162"
OpenJDK Runtime Environment (build 1.8.0_162-8u162-b12-0ubuntu0.16.04.2-b12)
OpenJDK 64-Bit Server VM (build 25.162-b12, mixed mode)

All nodes have
MAX_OPEN_FILES=65536
MAX_LOCKED_MEMORY=unlimited
MAX_MAP_COUNT=262144

Mast node before the crash:

Ingest node before the crash:

Master log: https://www.dropbox.com/s/vro7yls5mmfmu0u/master.log?dl=0
Ingest Log: https://www.dropbox.com/s/6trmcp5r0beulxa/ingest.log?dl=0


(Adrien Grand) #2

The fact that master and ingest nodes have the problem but not data nodes suggests that the issue might be related to the size of your cluster state. Maybe you have many indexes / shards / fields? What is the size of the output of GET /_cluster/state?


(None) #3

@jpountz

Hi, 800 indexes and 8000 shards give or take. The state is about 720KB

I should also add it's daily indexes, but most are small.

Out of the 800...

  • 60 are between 1 to 3 million documents.
  • 100 are between 100K to 900K documents.
  • The rest are bellow 100K documents.

(Mark Walkom) #4

You have too many shards, look to use _shrink on older ones and reduce the count or switch to weekly/monthly indices.


(None) #5

And if I want to keep that many say daily but maximum a year? Do I just increase the master ram to 30gb or add more nodes?

Can I have older indexes as monthly and the newer ones as daily? How will the date math work if we can do this?


(Mark Walkom) #6

Given your index size, keeping a year's worth of indices (plus that last month) around isn't going to be worth keeping daily with that shard count.


(None) #7

So can I take old ones make them monthly and new ones daily? Will Kibana date math work on both?


(Christian Dahlqvist) #8

Switching from daily to monthly requires reindexing, so it is better if you switch to monthly indices for all data. Kibana does not use date math based on index names any longer, so that is not a problem.


(None) #9

Ok. Thanks.


(None) #10

Ok I will reindex but I have to phase it obviously. So I will do older ones first to monthly and then the newer ones eventually.


(None) #11

On monthly indexes we cannot do daily backups though can we?


(Mark Walkom) #12

Sure you can.


(None) #13

@warkolm

Cool so reading the docs... Snapshots only store the changed files correct? So if a monthly index has NOT changed in 31 days and we took 31 snapshots the snapshot repo would remain the same size and NOT have grown right?


(Mark Walkom) #14

More than likely. You can reduce the chance by running a forge merge once they have been written to.


(None) #15

Hi, so far it seems stable, thanks! I thought people were running with much more indexes?


(Mark Walkom) #16

Forget about the number of indices, it's the shards that matter.


(None) #17

Ok cool gotcha. So, 1 index with 5 shards is the same as 5 indexes with 1 shard each. And each shard is a Lucene index which takes up X amount of resources.


(Mark Walkom) #18

Exactly.


(system) #19

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.