I try to create query for nested object that contains year and month. Both of them are optional. If some field not exists we treat them as hit. I found one solution but it causes combinatorial explosion of terms so I'm trying to find a better solution.
Steps of reproduction:
- Creating index with mapping
PUT /date-test
{
"mappings": {
"properties": {
"datesOfBirth": {
"type": "nested"
}
}
}
}
- Add documents with nested objects
PUT /date-test/_doc/1
{
"name": "Object1",
"datesOfBirth": []
}
PUT /date-test/_doc/2
{
"name": "Object2",
"datesOfBirth": [
{
"year": 1990,
"month": 4
}
]
}
PUT /date-test/_doc/3
{
"name": "Object3",
"datesOfBirth": [
{
"year": 1995,
"month": 2
},
{
"year": 1998,
"month": 4
}
]
}
PUT /date-test/_doc/4
{
"name": "Object4",
"datesOfBirth": [
{
"month": 4
}
]
}
- This query works as expected for year range 1994-1996 and month range 1-5 (objects 1, 3, 4 are returned):
{
"size": 1000,
"query": {
"bool" : {
"should": [
{ "bool": {"must_not": [ //match when all fields are absent
{ "nested": { "path": "datesOfBirth", "query": { "exists": { "field": "datesOfBirth.year" }} }},
{ "nested": { "path": "datesOfBirth", "query": { "exists": { "field": "datesOfBirth.month" }} }}
]
}},
{ "bool": {"must_not": [ //match when year is absent but month exists and match to range
{ "nested": { "path": "datesOfBirth", "query": { "exists": { "field": "datesOfBirth.year" }} }}
],
"should": [
{"nested": { "path": "datesOfBirth", "query": { "bool": { "must": [
{ "range": { "datesOfBirth.month": { "gte": 1, "lte": 5} } }
]
}}}}
]
}},
{ "bool": {"must_not": [ //match when month is absent but year exists and match to range
{ "nested": { "path": "datesOfBirth", "query": { "exists": { "field": "datesOfBirth.month" }} }}
],
"should": [
{"nested": { "path": "datesOfBirth", "query": { "bool": { "must": [
{ "range": { "datesOfBirth.year": { "gte": 1994, "lte": 1996} } }
]
}}}}
]
}},
{"nested": { "path": "datesOfBirth", "query": { "bool": { "must": [ //both fields exists and must match to given ranges
{ "range": { "datesOfBirth.year": { "gte": 1994, "lte": 1996} } },
{ "range": { "datesOfBirth.month": { "gte": 1, "lte": 5} } }
]
}}}}
],
"minimum_should_match": 1
}
}
}
Is there better way to achieve that behaviour? I'm using Elasticsearch 7.1.