Hello,
there are two fields in my docs, cid and os. Cid is a high cardinate field, while os
either android or ios.
I want to know the difference between the following aggs, which is better.
There is no notion of better, as both return different data
The first one returns the first 1000 'cids' and for each of those, you will get a count that returns the operating system, either android or ios. This means you will see a thousand different cids.
The second query however can return more than 1000 different cids, as it will return the first 1000 cids for android and the first 1000 cids for IOS, which might be completely different cids, up to 2000 unique ones.
If a cid typically only has one os then the os->cid form will use less bytes than the cid->os to convey the same information.
Either way, watch out for non-zero values in doc_count_error_upper_bound in results. If this happens consider increasing shard_size setting to trade RAM for accuracy.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.