I'm having some problems with some queries, even simple ones, that impact the ingest on my elastic cluster and leave some gaps in the data.
Currently I have three elastic nodes, each one has 8 cores and 28 GB of RAM with 20 GB of this total is dedicated to the Java Heap, also each node is a master elegible node, data node and ingest node.
My logstash ingest pipeline is configured to send data to all the three nodes, but my Kibana instance is configured to connect to only one of the nodes, the master one, since it do not support an array of hosts in the configuration file.
Sometimes when I run a query for 24 hours on one if my indices (around 180 million documents), it impacts the performance and I have some gaps or missing data, sometimes even small queries for 30 minutes of data (around 4 million documents) can deal the same impact.
What would be the best way to deal with this problem?
I was thinking about using a coordinating node cluster, but since my cluster is on a cloud service, I am trying to avoid to bump up the costs too much.
The size of the coordinating nodes need to be the same as the master nodes? The documentation only says that "such a node needs to have enough memory and CPU in order to deal with the gather phase", but what is the best way to estimate the memory and cpu cores based on this?
My main logstash instance is on my local infrastructure, so I can output to elasticsearch using http_compression and reduce the bandwith used to send data to the cloud, but I also have another logstash instance on the cloud, to receive data from servers that are already in the cloud, is there any way that I can send data from my local logstash to my remote logstash and use http_compression, without the need to writing any files and use filebeat?