Trouble Finding Most Efficient Way to Optimize My Elastic Stack


I've been trying to play around with my settings to try to optimize my Elastic Stack. My main goal, right now, is to have my searches load faster. For example, when I load my dashboards, it takes 30 seconds to 1+ min for all my data to load. I did add custom runtime fields to pull specific data out of the logs, but not sure if that effects my performance. At the same time, all my hosts that the elasticsearch nodes are on, show a noticeable lag. Like when I open a web browser, it takes around the same time for the browser to even come up. Things like my terminal, opening up files or just any application has a delay.
At first my I kept the default JVM heap to 50% on all nodes, then lowered it to 30% to see if there was a difference, but I didn't notice any difference. I also tried increasing and decreasing shard sizes. I did see an increase in delay when I increased the primary shard sizes from 1gb to 3gb.

My elastic stack environment are all containers operating on RHEL. I currently have elasticsearch, kibana, package-registry (for integrations), and a fleet-server containers all running in a podman network environment. On the main server, I have one ES node, kibana, package-registry, and the fleet server container running. Then I have my 2 other ES nodes running on two separate machines.

All my nodes and agents are in healthy status, JVMHeap is currently at 40% on all nodes, I set my Index Lifecycle Policies for logs* and metrics* to 3gb for max primary shard size (previously at 1gb) for Hot, I have the data merged in the Warm phase with 1 segment. I have 32 total elastic-agents reporting.
I have all the default indices and only need the data coming from /var/log/secure/, /var/log/messages, and /var/log/audit/audit.log.
I have about 140million documents with 25GB of data, 83 prim shards and 43 replica shards (I took away the replica shards from the log* indices to test for optimization).

I'm not really sure how to optimize my shards, I really just been playing around with the numbers to see if I see a difference.
When I use the 'top' command, on the hosts, to see what processes are taking up memory and/or CPU, it just shows the java (JVMHeap) at 40% and CPU usage is low.

If anyone has any suggestion, I would be most greatful. TIA.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.