Questions about growing the cluster

Disclaimer, I am new with working with ES (4 months now, new area at work, a SOC), so maybe there are some things that I may say wrong or did not in the best way because we were in a hurry at that time.

We currently have a cluster with a single node, serving logs from Graylog to ES. The ES node is running in a big server, using the entire hardware, it's a 32 core Xeon E7 with 64 GB of RAM. We now have 255 indices with 4 shards each (default Graylog configuration) and 5 billion documents. The big problem we have is that the data path for the node is NFS, and sadly it can't be other way, it has to be NFS and we have to deal with it (the databases are in a separate network from the servers, no production data can be stored in the server production network).

After a while we started getting the "not enough thread pool" error in Kibana searches, so after investigating we decided to add more nodes to the cluster to ease the search load by virtualizing the entire Xeon into 4 nodes, each of them using the same data path which we are currently using.

Our plan to upgrade is to (we have another Xeon E7) configure the 4 new nodes on the new server and add them to the current node, after that we will recycle the first server for another purpose. For this we have some doubts and questions:

  • Are we going in the right direction by virtualizing the entire hardware in 4 virtual nodes? Or should we use the entire hardware? We have a ton of documents indexed (5 billion) and it's growing everyday, we currently have ~1200 messages per second after 4 months and it's just a fraction of what we will recieve in the future.

  • Is there a problem if we use the same data path for the new 4 nodes? We have an option to get 4 new NFS partitions, one for each node, as if it were the local storage scenario, which one is preferred? (besides not using NFS :frowning: )

  • If we go the 4 NFS partitions, one for each node, where do we mount the old-to-be data partition?

  • Any other considerations or tips for a ES newbie about growing the cluster?

Thanks.

Have you identified what is limiting performance in your cluster? Given that you have NFS storage it sounds like this is a likely/potential candidate, and adding compute resources may then not necessarily give you much.

As per your reddit thread, this is way too many and likely causing the issues. Use _shrink to cut that down.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.