Why do my cross-cluster IDs queries sometimes return all but 1 or 2 of the results?
I have a 7.2.0 cluster executing an IDs query across 3 remote clusters usually with 500-1000 IDs in each query. I set the size parameter to be the length of my ID array. I usually get back all of the IDs but sometimes I get back all but 1 or 2 of them. Repeating the query again gets me 100% of the results.
If I just hit the clusters directly instead of going through CCS, I get back 100% of the results all the time with size=len(my_ids).
Thinking it might be a fencepost error, I tried len(my_ids)+1 but no luck. I also tried setting ccs_minimize_roundtrips=False just as a wild guess.
Strange as that all is, I seem to have found a hacky workaround: if I double the size param (i.e., size=2*len(my_ids)), I get back 100% of the results all the time.
I wish I could replicate it for you. The 3 indices on the remote clusters are all multi-TB, and the error only happens 5-10% of the time for a group of IDs, and then never again for that group.
As I ended up needing to bring back more than 10000 results, I changed the search query to a scan/scroll. I have not observed the problem there.
I'm sorry I can't be more helpful in tracking this down. Perhaps someone else will run across this issue, too, and add their finding to this post.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.