My understanding is the node receiving the REST query will act as a coordinating node.
Is this assumption false?
What I'm trying to do is to force say 3 nodes to handle both ingest and coordinating functions. Those 3 nodes are not data nodes. For the sake of this discussion, we can leave out ingest function because I can simply split it into 3 new nodes.
From my experience, the CPU pressure is not on the 3 nodes receiving query when I'm performing CPU intensive aggregations.
If my setup is wrong, how can I achieve forcing coordination from certain none-data nodes?
The purpose is pretty simple. Have data nodes only to gather data. Leave aggregation to none-data nodes.
This way, I can scale them differently based on query need.
That is expected as a lot of the processing is performed by the nodes holding the data, before processed results are send to the coordinating node for final processing and aggregation.
Thanks for the quick response.
Can you please help answer this question?
I have a query which results in 7K document hits (very small in the index with billions of documents).
I then perform 2 layer aggregation. The search portion returns very quickly and consistently. But the aggr takes about 3 times longer.
If the gathering phase yields only 7K documents (across say 5 nodes/shards), shouldn't the aggregation phase be very quick as well?
we have noticed a similar aggregation on a different smaller index seems to be much faster. This suggests the total number of documents in the index plays a role. If it does, why? Shouldn't the filtering phase reduce the result where aggregation time should be fast?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.