Clarity about cross cluster search

Stuggling for a TLDR: can someone give a rough sense how much CCS has changed between 5.5 and 6.5, specifically around how the number of indices & shards in the remotes can impact the CCS nodes? Or, any caveats to mind when upgrading CCS from 5.5 to 6.5?

While reading this elastic blog post, the linked pull request challenged my understanding of how CCS works. I thought the interplay between CCS nodes and its remotes was very similar to a normal elasticsearch client with the primary difference being that CCS nodes could go through a reduction phase. From this PR a couple of weeks ago, I thought this was how CCS was designed & implemented initially:

This will be used in cross-cluster search when reduction will be
performed locally on each cluster. The CCS coordinating node will send
one search request per remote cluster involved and will get one search
response back from each one of them. Such responses contain all the info
to be able to perform an additional reduction and return results back
to the user.

I also stumbled on this removal in the 6.5 docs which has me questioning my understanding as well as how much is changing under the hood.

Thanks and apologies for the vague ask!

Good question,

between 5.6 and 6.5 not much has changed, the PR that you linked (https://github.com/elastic/elasticsearch/pull/37828) went only to master meaning that such improvement will be released with 7.0.

I recently updated the docs to explain a little how CCS works, would you mind having a look at that section from the master docs and let me know if it clears things up for you? You can find it here: https://www.elastic.co/guide/en/elasticsearch/reference/master/modules-cross-cluster-search.html#ccs-reduction .

In general, it's not clear what the trade-off of is. Some feedback about the docs.

  1. Small typo: "selected reduce mode" should be "selected reduce node"?

  2. There is low latency to our remotes but why would we not want to minimize roundtrips?

  3. Does it impact the amount of work performed on the remote side vs the CCS side?

  4. Does it impact the amount of state that the CCS side has to manage? Would ephemeral storage for a search node be ok?

No need to answer them, just the kinds of questions we're asking that other users might as well. Thank you!

Thanks for checking it out.

This sentence is the core of the trafe-off when minimizing round-trips: " A single request is sent to each remote cluster, at the cost of retrieving from + size already fetched results."

In order to be able to send a single request to each cluster, we need to ask for from+size results to each cluster, also if running terms aggregations each cluster will return shard_size buckets. This is so that results can be reduced once again in the CCS coordinating node once all results from all clusters have come in.

This comes with a cost: if you are familiar with the different phases of search, we generally do a query phase on all shards involved in the search request, then reduction on the coordinating node, which then fetches only the relevant hits (there are potentially additional phases, but these are the ones that are always executed). The main cost of minimizing round-trips is that each cluster will return from+size fetched hits, of which the majority will be discarded in the final reduction phases yet were already fetched (think potentially highlighted if highlighting is used and whatever may happen in the fetch phase).

  1. reduce mode is not a typo.

  2. because with more round-trips, as long as you have very low network latency, things are faster as we don't fetch hits that will not be returned

  3. yes, it impacts what is done and retrurned from each cluster so that round-trips can be minimized

  4. it's not really about state on the CCS side, though responses obtained from each cluster are potentially bigger when minimizing round-trips

Let me know if you have any further questions, does this help?

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.