Search for documents which does not contains nested items with some condition

Hello there ! :slight_smile:

I cannot found a way to retrieve documents which do not have a nested item corresponding to some condition. Let me explain this :
I've got an index which contain employees. All employees got this nested property :

        'hours_by_week' => [
            'type' => 'nested',
            'properties' => [
                'week' => [
                    'type' => 'integer'
                ],
                'available_hours' => [
                    'type' => 'integer'
                ]
            ]
        ],

This property let us know how many hours by week the employee is available, example value for this property:

 "hours_by_week": [
        {
            "week": 202034,
            "available_hours": 12
        } ,
       {
           "week": 202035,
           "available_hours": 8
        }
]

Note: week field is format as year + week number.
Now, let's said we want to retrieve employees which are available at least 10 hours on week 202034, I use this must condition :

[
    'nested' => [
        'path' => 'hours_by_week',
        'query' => [
            'bool' => [
                'must' => [
                    [
                        'term' => [
                            'hours_by_week.week' => 202034
                        ]
                    ],
                    [
                        'range' => [
                            'hours_by_week.available_hours' => [
                                'gte' => 10
                            ]
                        ]
                    ]
                ]
            ]
        ]
    ]
]

And it works... :slight_smile: almost ! :frowning:
Currently, it does not retrieve the documents for which I don't have a value for this week.
If I search for employee with at least 10 hours available on week 202036, I will not retrieve employees which do not have a nested item {"week" :202036, "available_hours": X} in theirs "hours_by_week" property.

How to do this ?

I already try something like this using a should :

[
    'nested' => [
        'path' => 'hours_by_week',
        'query' => [
            'bool' => [
                'should' => [
                    [
                        'bool' => [
                            'must' => [
                                [
                                    'term' => [
                                        'hours_by_week.week' => 202036
                                    ]
                                ],
                                [
                                    'range' => [
                                        'hours_by_week.available_hours' => [
                                            'gte' => 10
                                        ]
                                    ]
                                ]
                            ]
                        ]
                    ],
                    [
                        'bool' => [
                            'must_not' => [
                                'term' => [
                                    'hours_by_week.week' => 202036
                                ]
                            ]
                        ]
                    ]
                ]
            ]
        ]
    ]
]

But this condition is wrong, now I even get the documents which does not match the greater than 10 available hours condition. My guess is than this condition retrieve all the documents which has at least one hours_by_week item with a value different than 202036.

Did someone got another suggestions because I stuck on this for almost a fullday ?
Thanx!

Still searching on this :confused:

An another way to expose the problem as a SQL request:

SELECT * FROM employees AS e
LEFT JOIN employees_hours_by_week AS hbw ON hbw.employee_id = e.id AND hbw.week = "202036"
WHERE hbw.available_hours > 10 OR hbw.week IS NULL

Just found a "solution" even if it's seem dirty ! :slight_smile:

Rather than indexing my property value like this :

"hours_by_week": [
    {
        "week": 202034,
        "available_hours": 12
    },
    {
        "week": 202035,
        "available_hours": 8
    }
]

I use an associative array for this nested property :

 "hours_by_week": {
    "200234" : {
        "week": 202034,
        "available_hours": 12
    } ,
   "200235" : {
       "week": 202035,
       "available_hours": 8
    }
}

Which allow me to use the exists filter on it :

{
    "nested": {
        "path": "hours_by_week",
        "ignore_unmapped": true,
        "query": {
            "bool": {
                "should": [
                    {
                        "bool": {
                            "must": {
                                "range": {
                                    "hours_by_week.202034.available_hours": {
                                        "gte": 10
                                    }
                                }
                            }
                        }
                    },
                    {
                        "bool": {
                            "must_not": {
                                "exists": {
                                    "field": "hours_by_week.202034.available_hours"
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

Then it works ! :slight_smile:
But like I said, it's dirty, using this solution, I cannot declare a proper mapping for this property. Then ES guess for me the types of the subfields. It's not the problem at the moment so I think I will go with that !

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.