Elasticsearch search document with nested document with optional fields

I try to create query for nested object that contains year and month. Both of them are optional. If some field not exists we treat them as hit. I found one solution but it causes combinatorial explosion of terms so I'm trying to find a better solution.

Steps of reproduction:

  1. Creating index with mapping
PUT /date-test
{
    "mappings": {
        "properties": {
            "datesOfBirth": {
                "type": "nested"
            }
        }
    }
}
  1. Add documents with nested objects
PUT /date-test/_doc/1
{
    "name": "Object1",
    "datesOfBirth": []
}
PUT /date-test/_doc/2
{
    "name": "Object2",
    "datesOfBirth": [
        {
            "year": 1990,
            "month": 4
        }
    ]
}
PUT /date-test/_doc/3
{
    "name": "Object3",
    "datesOfBirth": [
        {
            "year": 1995,
            "month": 2
        },
        {
            "year": 1998,
            "month": 4
        }
    ]
}
PUT /date-test/_doc/4
{
    "name": "Object4",
    "datesOfBirth": [
        {
            "month": 4
        }
    ]
}
  1. This query works as expected for year range 1994-1996 and month range 1-5 (objects 1, 3, 4 are returned):
{
    "size": 1000,
    "query": {
        "bool" : {
            "should": [
                { "bool": {"must_not": [ //match when all fields are absent
                        { "nested": { "path": "datesOfBirth", "query": { "exists": { "field": "datesOfBirth.year" }} }},
                        { "nested": { "path": "datesOfBirth", "query": { "exists": { "field": "datesOfBirth.month" }} }}
                    ]
                }},
                { "bool": {"must_not": [ //match when year is absent but month exists and match to range
                        { "nested": { "path": "datesOfBirth", "query": { "exists": { "field": "datesOfBirth.year" }} }}
                    ],
                    "should": [
                        {"nested": { "path": "datesOfBirth", "query": { "bool": { "must": [
                            { "range": { "datesOfBirth.month": { "gte": 1, "lte": 5} } }
                        ]
                        }}}}
                    ]
                }},
                { "bool": {"must_not": [ //match when month is absent but year exists and match to range
                        { "nested": { "path": "datesOfBirth", "query": { "exists": { "field": "datesOfBirth.month" }} }}
                    ],
                    "should": [
                        {"nested": { "path": "datesOfBirth", "query": { "bool": { "must": [
                            { "range": { "datesOfBirth.year": { "gte": 1994, "lte": 1996} } }
                        ]
                        }}}}
                    ]
                }},
                {"nested": { "path": "datesOfBirth", "query": { "bool": { "must": [ //both fields exists and must match to given ranges
                    { "range": { "datesOfBirth.year": { "gte": 1994, "lte": 1996} } },
                    { "range": { "datesOfBirth.month": { "gte": 1, "lte": 5} } }
                ]
                }}}}
            ],
            "minimum_should_match": 1
        }
    }
}

Is there better way to achieve that behaviour? I'm using Elasticsearch 7.1.

Elasticsearch 7.1 is EOL and no longer supported. Please upgrade ASAP.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns :elasticheart: )

I've also tried always set field but with null in case of value absence and add mapping for year and month where I define null_value: -1. Then I can remove part with combination of field absence.

  1. Create index with mapping
PUT /date-test
{
    "mappings": {
        "properties": {
            "datesOfBirth": {
                "type": "nested",
                "properties": {
                    "year": { "type": "integer", "null_value": -1 },
                    "month": { "type": "integer", "null_value": -1 }
                }
            }
        }
    }
}
  1. Creating documents as follows:
PUT /date-test/_doc/7
{
    "name": "SomeObjectWithoutYear",
    "datesOfBirth": [
        {
            "year": null,
            "month": 4
        }
    ]
}

Then I can do query like this:

POST /date-test/_search
{
    "size": 1000,
    "query": {
        "bool" : {
            "should": [
                { "bool": {"must_not": [
                        { "nested": { "path": "datesOfBirth", "query": { "exists": { "field": "datesOfBirth.year" }} }},
                        { "nested": { "path": "datesOfBirth", "query": { "exists": { "field": "datesOfBirth.month" }} }}
                    ]
                }},
                {"nested": { "path": "datesOfBirth", "query": { "bool": { "should": [
                    { "match": { "datesOfBirth.year": { "query": -1 } } },
                    { "match": { "datesOfBirth.month": { "query": -1 } } },
                    { "range": { "datesOfBirth.year": { "gte": 1994, "lte": 1996} } },
                    { "range": { "datesOfBirth.month": { "gte": 1, "lte": 5} } }
                ],
                "minimum_should_match": 2
                }}}}
            ],
            "minimum_should_match": 1
        }
    }
}

But I'm wondering if it is the cleanest way to achieve that.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.