Hi,
I have a index with 9 Million documents. I performed Composite aggregation on nested field and I also mentioned doc_count sorting. Here I lost a lot of perfect buckets.
Here is my aggregation query with size 2000
"aggs": {
"location": {
"nested": {
"path": "resume.profile.locations"
},
"aggs": {
"location_s": {
"composite": {
"size": 2000,
"sources": [
{
"state": {
"terms": {
"field": "resume.profile.locations.stateCanonical.keyword"
}
}
},
{
"state_code": {
"terms": {
"field": "resume.profile.locations.stateCode.keyword"
}
}
},
{
"city": {
"terms": {
"field": "resume.profile.locations.cityCanonical.keyword"
}
}
},
{
"postal_code": {
"terms": {
"field": "resume.profile.locations.postalCode.keyword"
}
}
},
{
"country": {
"terms": {
"field": "resume.profile.locations.countryCanonical.keyword"
}
}
},
{
"address_type": {
"terms": {
"field": "resume.profile.locations.addressType.keyword"
}
}
},
{
"confidence_score": {
"terms": {
"field": "resume.profile.locations.confidenceScore"
}
}
}
]
},
"aggs": {
"doc_count_sort": {
"bucket_sort": {
"sort": [
{
"_count": "desc"
}
]
}
}
}
}
}
}
}
I got Empty buckets first like,
{
"key": {
"state": "",
"state_code": "",
"city": "",
"postal_code": "00729",
"country": "",
"address_type": "present",
"confidence_score": 1
},
"doc_count": 7
},
{
"key": {
"state": "",
"state_code": "",
"city": "",
"postal_code": "00841",
"country": "",
"address_type": "present",
"confidence_score": 1
},
"doc_count": 7
},
{
"key": {
"state": "",
"state_code": "",
"city": "",
"postal_code": "00962",
"country": "",
"address_type": "present",
"confidence_score": 1
},
"doc_count": 7
}
Actually those postalcode have city and state but, the elasticsearch is not showing..
When I increased the size from 2000 to 20000
I got buckets like,
{
"key": {
"state": "alabama",
"state_code": "AL",
"city": "huntsville",
"postal_code": "35810",
"country": "united states of america",
"address_type": "Present",
"confidence_score": 1
},
"doc_count": 3107
},
{
"key": {
"state": "alabama",
"state_code": "AL",
"city": "montgomery",
"postal_code": "36116",
"country": "united states of america",
"address_type": "Present",
"confidence_score": 1
},
"doc_count": 2728
},
{
"key": {
"state": "alabama",
"state_code": "AL",
"city": "montgomery",
"postal_code": "36117",
"country": "united states of america",
"address_type": "Present",
"confidence_score": 1
},
"doc_count": 2101
}
Can anybody tell me what is the issue here. How can I get complete data buckets with less size without changing the search query because I need to perform on whole data.
Thank you,