Aggregation based on terms field is not working

Aggregation based on terms field is not working. I loaded the data with different (_id), with the same data. I Am trying to find a duplicate of records based on one of the index fields. using the below simple terms query, it's not returning the results which were expected.

Sample Query Used, but able to get other records,

{
"size": 0,
"seq_no_primary_term": true,
"aggs": {
"groupbyordernum": {
"filter": {
"term": {
"region": "us"
}
},
"aggs": {
"bucketordernum": {
"terms": {
"field": "order_num",
"min_doc_count": 2,
"exclude": "0"
}
}
}
}
}
}

@rajthilakr
Can you post sample data and reason why you think results are incorrect?

This is what I call the "Elizabeth Taylor" problem.
She was an actress who famously married many times and would be hard to find in a distributed database if you were looking for people with the most marriages.
Marriages, like your records, preferably only occur once and in most cases this is true. If your system holding marriage certificates spreads them across many shards randomly then each shard could end up thinking Elizabeth Taylor was like everyone else in their subset of data - only married once. This makes it hard for the system as a whole to determine which of the millions of people held on file were married more than once.
If you use custom routing of documents you can ensure each person's documents end up on the same shard and get the accurate answers.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.