Scripted filter on nested field


#1

I have to filter documents based on scripted value of nested element.

My data structure looks like this:

{
	"product-group-name" : "group1"
	"products": [
		{
			"name" : "product1"
			"price" : 10
		},{
			"name" : "product2"
			"price" : 20
		}
	]
}

"products" is nested field. Now I need to find product groups (top level documents) where one of the nested "price" fields satisfy condition I define with the script (I have complex script expression that can not be replaced with simple nested query. In fact, this query is build for customers which all have different rules how price is calculated: based on their discounts, product categories and other parameters provided to the scripts during query time).

How can I write scripted filter like that?

I have tried something like this (script part is deliberately simplified):

filter : {
    "nested": {
        "path": "products",
        "filter": {
            "script": {
                "script": "if (doc['price'].value > 5) {return (true)}",
            }
        }
    }
}

But it seams that "price" values are not accessible via doc notation. Also I need to be able to sort and aggregate on scripted "price" value. So far what I have seen in documentation is that it's possible to sort based on scripted and nested field but not combination of both? Sort Documentation


Filtering by Number of Nested Objects
(Adrien Grand) #2

I think it should be doc['products.price'] instead of doc['price'].


#3

Thanks, that worked! But I have noticed quite weird performance problems while executing my script in nested context.
E.g.

filter : {
    "nested": {
        "path": "products",
        "filter": {
            "script": {
                "script": "  // do some work
                                return true"
            }
        }
    }
}

If I replace "return true" with "return false" in upper script, performance decreases ~1000 times although script itself does exactly same work except last "return" line. My only explanation is that Elasticsearch handles false in nested context in a way that it tries to execute script again and again (in case of "return true", query is super fast, also if I execute script outside of nested context). I have tried this in Elasticsearch 1.5.1 and 1.7.1.
I hope this sounds familiar to someone :smile:


#4

I did some script debugging based on instructions provided here and it seems that in case of "return false" script is executed for all nested document in the index and if "return true" only for few (although the query part returned only 10 results) . Outside of nested context script is properly executed in both cases.
This might be due to Elasticsearch optimization how query and filters results are combined or it might be Elasticsearch bug.
Was is definitively weird is that even using post_filter on 10 documents resulted in nested filter script executing for 400k documents (and thus being really slow).


(Adrien Grand) #5

This is a known issue that will be addressed in Elasticsearch 2.0. For more information, you can read the "Two-phase execution" section of https://www.elastic.co/blog/better-query-execution-coming-elasticsearch-2-0


(system) #6