Efficiency of aggregation in my use cases?

linlma · August 6, 2015, 10:51pm

Hello Elastic Search experts,

I have a tens of million records, which is customer ID and city ID pair. There are tens of millions of unique customer ID, and only a few hundreds unique city ID. I want to do a merge to get all city ID aggregated for a specific customer ID, and pull back all records. Wondering how to do this efficiently through ElasticSearch?

Here is an example of inputs,

CustomerID1 City1
CustomerID2 City2
CustomerID3 City1
CustomerID1 City3
CustomerID2 City4

I want output like (I want to pull all CityID aggregated records, not a record for a specific user, and wondering the most efficient way to do this),

CustomerID1 City1 City3
CustomerID2 City2 City4
CustomerID3 City1

thanks in advance,
Lin

kresimirus · August 10, 2015, 10:26pm

I don't think there is much you can do: create aggregation on customer field and inside of it nest aggregation for city. Something like this:

"aggs": {
    "customers": {
        "terms": {
            "field": "customer"
            "size": 0
        },
        "aggs": {
            "cities": {
                "terms": {
                    "size": 0,
                    "field": "city"
                }
            }
        }
    }
}

And fetch only aggregation

linlma · August 13, 2015, 10:36pm

Thanks @kresimirus, in your query, size:0 means?

kresimirus · August 14, 2015, 2:36pm

Default aggregation size is 10, so without size 0 in your case you would get top 10 customers appearing in most pairs and for every one of them top 10 cities. Size 0 means that you specify no limit for aggregation (if that is what you want).

linlma · August 15, 2015, 7:44am

@kresimirus, thanks for the response. I did some study for search today, I do not quite catch what means "This means that if the number of unique terms is greater than size, the returned list is slightly off and not accurate (it could be that the term counts are slightly off and it could even be that a term that should have been in the top size buckets was not returned)"? If you could advise or show an example, it will be great.

Have a good weekend, and this is what I am referring from,

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_size?q=size

kresimirus · August 17, 2015, 5:14pm

You have example in the same article one sentence after the one you quoted

linlma · August 18, 2015, 1:47am

Thanks @kresimirus

Where it is mentioned default size is 10? I did some search and cannot figure out.

regards,
Lin

Topic		Replies	Views
Term aggregation size 0 Elasticsearch	5	5805	July 5, 2017
Aggregation size 0 for top results Elasticsearch	14	12674	July 12, 2018
Difficulty understanding the "size" parameter in aggregations Elasticsearch	9	589	July 5, 2017
Aggregation giving inconsistent results Elasticsearch	4	1869	July 6, 2017
Terms Aggregrations: size=0 does not work (V1.0.RC2) Elasticsearch	2	348	July 6, 2017

Efficiency of aggregation in my use cases?

Related topics