Stuggling for a TLDR: can someone give a rough sense how much CCS has changed between 5.5 and 6.5, specifically around how the number of indices & shards in the remotes can impact the CCS nodes? Or, any caveats to mind when upgrading CCS from 5.5 to 6.5?
While reading this elastic blog post, the linked pull request challenged my understanding of how CCS works. I thought the interplay between CCS nodes and its remotes was very similar to a normal elasticsearch client with the primary difference being that CCS nodes could go through a reduction phase. From this PR a couple of weeks ago, I thought this was how CCS was designed & implemented initially:
This will be used in cross-cluster search when reduction will be
performed locally on each cluster. The CCS coordinating node will send
one search request per remote cluster involved and will get one search
response back from each one of them. Such responses contain all the info
to be able to perform an additional reduction and return results back
to the user.
I also stumbled on this removal in the 6.5 docs which has me questioning my understanding as well as how much is changing under the hood.
This sentence is the core of the trafe-off when minimizing round-trips: " A single request is sent to each remote cluster, at the cost of retrieving from + size already fetched results."
In order to be able to send a single request to each cluster, we need to ask for from+size results to each cluster, also if running terms aggregations each cluster will return shard_size buckets. This is so that results can be reduced once again in the CCS coordinating node once all results from all clusters have come in.
This comes with a cost: if you are familiar with the different phases of search, we generally do a query phase on all shards involved in the search request, then reduction on the coordinating node, which then fetches only the relevant hits (there are potentially additional phases, but these are the ones that are always executed). The main cost of minimizing round-trips is that each cluster will return from+size fetched hits, of which the majority will be discarded in the final reduction phases yet were already fetched (think potentially highlighted if highlighting is used and whatever may happen in the fetch phase).
reduce mode is not a typo.
because with more round-trips, as long as you have very low network latency, things are faster as we don't fetch hits that will not be returned
yes, it impacts what is done and retrurned from each cluster so that round-trips can be minimized
it's not really about state on the CCS side, though responses obtained from each cluster are potentially bigger when minimizing round-trips
Let me know if you have any further questions, does this help?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.