I have a 3 node cluster (16 vCPU, 64 GB of RAM, 3 Tb of data per node, JVM Heap at 30GB) with 450 indices (1 primary shard and 1 replica per indice).
Following an upgrade from 6.7 to 6.8, the activation of TLS on Transport and HTTP and the activation of security (native authentication), we started seeing circuit breaking exceptions in the elastic logs.
After some investigations I found out that the JVM Heap is mainly used by fielddata, and most of the fielddata memory is used by the "_id" field :
GET _cat/fielddata?v&fields=*&s=size -QKIH1UCRaKUmZSddRj6YQ x.x.x.x x.x.x.x NodeA type.raw 5kb JlIPM63SQ-OrLjcsr3q-yg y.y.y.y y.y.y.y NodeC type.raw 5.2kb Oj0_TCGcSWac4Zk-vhe3hA z.z.z.z z.z.z.z NodeB type.raw 6.7kb -QKIH1UCRaKUmZSddRj6YQ x.x.x.x x.x.x.x NodeA shard.state 7kb Oj0_TCGcSWac4Zk-vhe3hA z.z.z.z z.z.z.z NodeB shard.state 8.1kb JlIPM63SQ-OrLjcsr3q-yg y.y.y.y y.y.y.y NodeC shard.index 21.9kb Oj0_TCGcSWac4Zk-vhe3hA z.z.z.z z.z.z.z NodeB src_ip 41.5kb -QKIH1UCRaKUmZSddRj6YQ x.x.x.x x.x.x.x NodeA shard.index 41.7kb Oj0_TCGcSWac4Zk-vhe3hA z.z.z.z z.z.z.z NodeB shard.index 42.5kb -QKIH1UCRaKUmZSddRj6YQ x.x.x.x x.x.x.x NodeA src_ip 97.8kb JlIPM63SQ-OrLjcsr3q-yg y.y.y.y y.y.y.y NodeC src_ip 103.2kb -QKIH1UCRaKUmZSddRj6YQ x.x.x.x x.x.x.x NodeA _id 23.9gb Oj0_TCGcSWac4Zk-vhe3hA z.z.z.z z.z.z.z NodeB _id 24gb JlIPM63SQ-OrLjcsr3q-yg y.y.y.y y.y.y.y NodeC _id 24gb
Is this a normal behavior ? How can I decrease the memory used ?
The third node was added to the cluster recently to try to split the load but it doesn't change anything.
I have a lot of fields in my indexes, would decreasing the number of fields change that ?