I have a tens of million records, which is customer ID and city ID pair. There are tens of millions of unique customer ID, and only a few hundreds unique city ID. I want to do a merge to get all city ID aggregated for a specific customer ID, and pull back all records. Wondering how to do this efficiently through ElasticSearch?
Default aggregation size is 10, so without size 0 in your case you would get top 10 customers appearing in most pairs and for every one of them top 10 cities. Size 0 means that you specify no limit for aggregation (if that is what you want).
@kresimirus, thanks for the response. I did some study for search today, I do not quite catch what means "This means that if the number of unique terms is greater than size, the returned list is slightly off and not accurate (it could be that the term counts are slightly off and it could even be that a term that should have been in the top size buckets was not returned)"? If you could advise or show an example, it will be great.
Have a good weekend, and this is what I am referring from,
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.