I have encountered a strange problem whereby the total hits don't seem to be adding up when I use multiple filters.
I have an index with around 40 million documents. A typical document looks like something like this:
{
"postcodePrefix": "BD1",
"transactions": [
{
"price": 100000,
"category": "A",
"date": "2020-01-10"
},
{
"price": 200000,
"category": "B",
"date": "2020-01-20"
}
]
}
When I do simple query, for example:
POST /INDEXNAME/_search?size=0
{
"query": {
"bool": {
"must": [
{
"match": {
"postcodePrefix.keyword": "BD1"
}
}
]
}
}
}
The total hits comes back as 2766 (so far so good)
If I then add an additional nested query, for example:
POST /INDEXNAME/_search?size=0
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "transactions",
"query": {
"bool": {
"should": [
{
"match": {
"transactions.category": "A"
}
}
]
}
}
}
},
{
"match": {
"postcodePrefix.keyword": "BD1"
}
}
]
}
}
}
The total hits now comes back as 2230 (no problem yet)
Last but not least, if i run the nested query again but this time with category "B", the total hits now comes back as 705
(Please note that in the data there are only 2 possible categories, "A" and "B")
What i don't understand is how the following total hits are calculated:
- No category specified: 2766 hits
- Category A: 2230 hits
- Category B: 705 hits
Shouldn't the query where no category is specified return 2230 + 705 = 2935 hits?