We are setting an elastic cluster for os logs purpose. We wish the system to collect 30k eps constant and store it for compliance.
We see that after days the system stops working, nodes become unavailable, JVM memory utilization hits 99% . In Elastic nodes we saw [FIELDDATA] New used memory 19873079695 [18.5gb] from field [@timestamp] would be larger than configurated breaker 19851234508 [18.4gb], breaking.
Our mem config was :
As we are loggin time oriented data I expect problems with storing all timestamp in memory. We tried to configure breaker to lower value like 30 % but that does not give us much.
We set for this env 11 physical machines with 128 GB ram. A lot of storage in RAID6.
On 3 servers we have 1xmaster and 1x data node
On 8 server we run 2x data node.
Indexes have primary shards + 2 replicas.
In the system we have 54 TB of data in primary shards. (~160 TB replicated)
We run elastic 1.7.3
Recently one node hit 99% jvm and throw Java out of mem excepltion what caused the whole cluster to become unstable.
We bearly not search for any data. Just Marver is following the system state.
My question is:
Can we force system not to load @timestamp into RAM. Does it happen for every open index?
We mainly care for current index, as indexing process puts data in it. But we cannot close all the rest and this must be ready to possible searches.
Can we generally think about elastic for archiving purposes with the level of 30k eps ?
What could be done in order to limit memory utilisation? I think we followed all best practices for swapiness etc.
Appreciate Your help,