Hi
I have a requirement where in i need to aggregate over multiple fields which can result in millions of buckets.
We have data with millions of records, and here i need to get average number of records for each unique combination of 3 columns - FirstName, MiddleName, LastName.
Or you can say the frequency for each unique combination of FirstName, MiddleName and LastName.
When i try to use the terms aggregation over these 3 fields, got too_many_buckets_exception exception, as the default bucket size is 10k. Increased it to 100k, it worked but i think it's not the right way performance wise. It worked for the current sample of data, but the bucket size may go to millions.
Can you please suggest a way to achieve this.
Citing below the mappings, and search query for reference.
"mappings" : {
"_meta" : {
"created_by" : "ml-file-data-visualizer"
},
"properties" : {
"FirstName" : {
"type" : "keyword"
},
"MiddleName" : {
"type" : "keyword"
},
"LastName" : {
"type" : "keyword"
}
}
}
Search Query -
GET /names/_search
{
"size": 0,
"aggs": {
"First_Level": {
"terms": {
"field": "FirstName",
"size": 1000,
"min_doc_count": 1
},
"aggs": {
"Second_Level": {
"terms": {
"field": "MiddleName",
"size": "1000",
"min_doc_count": 1
},
"aggs": {
"Third_Level": {
"terms": {
"field": "LastName",
"size": "1000",
"min_doc_count": 1
}
}
}
}
}
}
}
}
Thanks.