Efficiency of aggregation in my use cases?

Hello Elastic Search experts,

I have a tens of million records, which is customer ID and city ID pair. There are tens of millions of unique customer ID, and only a few hundreds unique city ID. I want to do a merge to get all city ID aggregated for a specific customer ID, and pull back all records. Wondering how to do this efficiently through ElasticSearch?

Here is an example of inputs,

CustomerID1 City1
CustomerID2 City2
CustomerID3 City1
CustomerID1 City3
CustomerID2 City4

I want output like (I want to pull all CityID aggregated records, not a record for a specific user, and wondering the most efficient way to do this),

CustomerID1 City1 City3
CustomerID2 City2 City4
CustomerID3 City1

thanks in advance,
Lin

I don't think there is much you can do: create aggregation on customer field and inside of it nest aggregation for city. Something like this:

"aggs": {
    "customers": {
        "terms": {
            "field": "customer"
            "size": 0
        },
        "aggs": {
            "cities": {
                "terms": {
                    "size": 0,
                    "field": "city"
                }
            }
        }
    }
}

And fetch only aggregation

1 Like

Thanks @kresimirus, in your query, size:0 means?

Default aggregation size is 10, so without size 0 in your case you would get top 10 customers appearing in most pairs and for every one of them top 10 cities. Size 0 means that you specify no limit for aggregation (if that is what you want).

1 Like

@kresimirus, thanks for the response. I did some study for search today, I do not quite catch what means "This means that if the number of unique terms is greater than size, the returned list is slightly off and not accurate (it could be that the term counts are slightly off and it could even be that a term that should have been in the top size buckets was not returned)"? If you could advise or show an example, it will be great. :smile:

Have a good weekend, and this is what I am referring from,

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_size?q=size

You have example in the same article one sentence after the one you quoted :smile:

1 Like

Thanks @kresimirus :blush:

Where it is mentioned default size is 10? I did some search and cannot figure out. :smile:

regards,
Lin