Max_score anyone?


(george_monroe) #1

Hi guys,

I've been beating my head against this one for a while. Any help is really
appreciated. I have a Parent / Child relationship that is 1 to Many - each
parent has lots of children. My use case is to only return those parents
whose children aggregate (sum to) a certain $ amount. But with the added
caveat: the $ amount has to be within a certain range. How best to do this?

Imposed requirements:

  • return all such parents in one go (one request)
  • return only those parents who have children whose sum/aggregate falls
    within a range (between MIN and MAX)

The new aggregations framework seems to be off limits since it does not
work across Parent/Child. The post_filter seems to be off limits since it
does not work within has_child. Which attack line to take?

Here is my current query that satisfies only one half of the requirement:
the min_score

curl -X GET '0:9200/segmentation-cd/animal/_search?pretty' -d
'{ "min_score" : 250,
"query" : {
"bool" : {
"must" : [
{ "function_score" : {
"query" : {
"term" : { "animal.sex" : "neutered" }},
"script_score" : {"script" : "0"}}},
{ "has_child" : {
"type" : "customer",
"score_type" : "sum",
"query" : {
"function_score" : {
"filter" : {
"query" : {
"term" : { "customer.first_name" : "Michael"
}}},
"script_score" : {"script" : "0"}}}}},
{ "has_child" : {
"type" : "visit",
"score_type" : "sum",
"query" : {
"function_score" : {
"filter" : {
"bool" : {
"must" : [
{ "range" : { "visit_date" : {
"from" : "2001-07-28T00:00:00.000Z", "to" : "2011-11-18T00:00:00.000Z",
"include_lower" : false, "include_upper" : true}}}]}},
"script_score" : {"script" : "_source.revenue"}}}}}]}}}'

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9781ee00-f48f-4bf1-b9fd-1e0b1abaea16%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Clinton Gormley) #2

Hi Yuri

First, here's a query that will work for you:

I'm using a has_parent query to sum up all the values of the revenue field,
then using a function_score query to ensure that those values fall within a
range.

However, your data model feels like you are trying to implement a
relational database in Elasticsearch. This is not optimal :slight_smile: These
parent-child joins have a cost. It is much more efficient just to
denormalise your data, ie store everything you want to be able to search on
in the same document.

For instance, your customer's name isn't likely to change. In fact the
owner of the animal isn't likely to change either. So why not store the
customer info in the animal document?

Keeping visits as a child may be the correct approach, but you could also
look at storing them as nested objects:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-nested-type.html

It depends whether you want to be able to return individual visits as
search results (keep them as parent-child) or not (use nested).

hth

Clint

On 12 March 2014 09:21, Yuri Panchenko yuri.panchenko@gmail.com wrote:

Hi guys,

I've been beating my head against this one for a while. Any help is really
appreciated. I have a Parent / Child relationship that is 1 to Many - each
parent has lots of children. My use case is to only return those parents
whose children aggregate (sum to) a certain $ amount. But with the added
caveat: the $ amount has to be within a certain range. How best to do this?

Imposed requirements:

  • return all such parents in one go (one request)
  • return only those parents who have children whose sum/aggregate falls
    within a range (between MIN and MAX)

The new aggregations framework seems to be off limits since it does not
work across Parent/Child. The post_filter seems to be off limits since it
does not work within has_child. Which attack line to take?

Here is my current query that satisfies only one half of the requirement:
the min_score

curl -X GET '0:9200/segmentation-cd/animal/_search?pretty' -d
'{ "min_score" : 250,
"query" : {
"bool" : {
"must" : [
{ "function_score" : {
"query" : {
"term" : { "animal.sex" : "neutered" }},
"script_score" : {"script" : "0"}}},
{ "has_child" : {
"type" : "customer",
"score_type" : "sum",
"query" : {
"function_score" : {
"filter" : {
"query" : {
"term" : { "customer.first_name" :
"Michael" }}},
"script_score" : {"script" : "0"}}}}},
{ "has_child" : {
"type" : "visit",
"score_type" : "sum",
"query" : {
"function_score" : {
"filter" : {
"bool" : {
"must" : [
{ "range" : { "visit_date" : {
"from" : "2001-07-28T00:00:00.000Z", "to" : "2011-11-18T00:00:00.000Z",
"include_lower" : false, "include_upper" : true}}}]}},
"script_score" : {"script" : "_source.revenue"}}}}}]}}}'

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9781ee00-f48f-4bf1-b9fd-1e0b1abaea16%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/9781ee00-f48f-4bf1-b9fd-1e0b1abaea16%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPt3XKRigoNJBYk4YD8kX4LPSv0Cg8mygShMrAKoggxREcxYgg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3