Using missing and exists filters - null values edge case


(Gabriel Majivu) #1

I have a use case involving null_values - I need to show country in the results whether it is set or it is null upon filtering by country field. I have indexed user profiles with a country field which may be null (not set) or have a value. I have tried to play around with boolean query with exists and missing filters.

filter_1.json

{
    "query": {
        "bool": {
            "must": {
                "match": {
                    "full_name": "xxx xxx xxx"
                }
            },
            "filter": {
                "bool": {
                    "must": [{"term": { "gender": "M" }}],
                    "filter": [{"missing": {"field": "country"}}]
                }
            }
        }
    }
}

which gives:

$ curl "http://127.0.0.1:9200/profile/_search?pretty" --data-binary @filter_1.json | grep country
    "country" : null,
    "country" : null,
    "country" : null,
    "country" : null,
    "country" : null,

the converse:

filter_2.json

{"existing": {"field": "country"}}

which gives:

$ curl "http://127.0.0.1:9200/profile/_search?pretty" --data-binary @filter_2.json | grep country
    "country" : "SO",
    "country" : "SO",
    "country" : "SO",
    "country" : "SO",
    "country" : "SO",
    "country" : "SO",

Lastly, I have tried to adjust the filter in boolean query thus:

filter_3.json

"filter": {
    "or": [
        {"missing": {"field": "country"}},
        {"exists": {"field": "country"}}
    ]
}

Ideal output I want:

"country" : "SO",
"country" : "SO",
"country" : "SO",
"country" : null,
"country" : "SO",
"country" : null,

Any ideas on what to try?


(Christoph) #2

Why do you need the filter at all if you want both kind of documents, either with country set or the ones with missing values?


(Gabriel Majivu) #3

country could have multiple values like KE, SO, CD etc. When a user opts to filter by country, then I do not want to leave out the profiles with no country set.


(Christoph) #4

What if you use something like:

"or": [
        {"missing": {"field": "country"}},
        {"match": {"country": "KE"}}
    ]

(Gabriel Majivu) #5

Awesome! this works. Thanks @cbuescher. Should I be concerned about performance since the docs advocate for positive queries.


(Christoph) #6

I don't know what kind of system you need and what performance you expect, but generally speaking those queries are not the most expensive ones. So I wouldn't worry too much if it stays that simple.


(Gabriel Majivu) #7

Ok, noted.


(system) #8