the important part is to figure out why GC happens. Are you overloading your nodes? Are you running queries using deep pagination? Are you sending huge documents to nodes? It's very hard to answer without further information.
Is the cluster running at capacity?
Can you explain why you have a single ingest node? If you are using ingest pipelines, this means that all index request using a pipeline will go to this node, effectively introducing a single point of failure.
could be that the nodes are overloaded, yes. The docs no are too huge and the user query could use pagination.
We have update two of our nodes with more ram and it seems that the cpu use has reduced a little.
We have only one ingest, but not using ingest pipelines, it was only to point our query/indexing to that node.
One question: could one peek of cpu in one node (by the gc for instance) influence in the peeks of other nodes in the cluster? or is more reasonable that this peek is for some big query?
the nodes stats and nodes info APIs allow you to see, where your time is spent, you can compare this among the nodes. Also there is the hot threads API which shows you where the CPU time is spent.
Regarding GC: The log files of each node show you how much time is spent with GC, so you can easily see if the performance issue and GC times overlap.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.