On using satistical facet result in scoring


(pyvandenbussche) #1

Hi,

I would like to tune my scoring by adding a weight depending on the value
of a field "occurrences". As this field value is not bounded [0;infinite[,
I would like to use the max value for this field in the context of a query.
That's why I can not compute this score at the index time but rather have
to do it at query time.

The idea would be to have a score of the type: score = score +
(occurrences/max(occurrences))

By using a statistical facet I can get the max value for this field. But
don't know how (if even possible) to re-inject this value in the score.

Here is my latest simplified version of my *Unsuccessful *query:
{
"facets" : {
"stat1" : {
"statistical" : {
"field" : "occurrences"
}
}
},
"query":{
"custom_score" : {
"query" : {
"match" : {
"label": "personal"
}
},
"script" : "_score + (doc['occurrences'].value/
doc['facets.stat1.max'].value)"
}
}
}

Could anyone help me by telling me:

  • if this idea is doable,
  • if yes, if the approach I describe here using statistical facet +
    custom scoring is the right way to proceed
  • How to do it :slight_smile:

By advance Thank you.
Pierre-Yves.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Zachary Tong) #2

Nope, you cannot access facet results from the search context.

Even if you could, the results would not be what you expect. Facets are
collected alongside the search execution (e.g. document is scored, then any
relevant facets are accumulated, then execution moves to the next document,
etc etc). This is happening in parallel on all the shards of an index. So
the final facet results are not accurate until they have been merged and
reduced on the coordinating node. While the facet is executing, the counts
are going to be way off the final.

You could, however, execute the facet first and collect the values you
need, then execute a search with the appropriate parameters injected into
the script. You can provide a map of values to the script, and it can look
up the required value in the map to match doc['occurrences'].value

-Zach

On Friday, September 20, 2013 10:11:53 AM UTC-4, Pierre-Yves Vandenbussche
wrote:

Hi,

I would like to tune my scoring by adding a weight depending on the value
of a field "occurrences". As this field value is not bounded [0;infinite[,
I would like to use the max value for this field in the context of a query.
That's why I can not compute this score at the index time but rather have
to do it at query time.

The idea would be to have a score of the type: score = score +
(occurrences/max(occurrences))

By using a statistical facet I can get the max value for this field. But
don't know how (if even possible) to re-inject this value in the score.

Here is my latest simplified version of my *Unsuccessful *query:
{
"facets" : {
"stat1" : {
"statistical" : {
"field" : "occurrences"
}
}
},
"query":{
"custom_score" : {
"query" : {
"match" : {
"label": "personal"
}
},
"script" : "_score + (doc['occurrences'].value/
doc['facets.stat1.max'].value)"
}
}
}

Could anyone help me by telling me:

  • if this idea is doable,
  • if yes, if the approach I describe here using statistical facet +
    custom scoring is the right way to proceed
  • How to do it :slight_smile:

By advance Thank you.
Pierre-Yves.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(dosmoney) #3

I have a similar issue accessing the facet results - i would like to count the number of accounts that have an aggregated amount greater than X.

I can use a terms_stats facet to aggregate amount by account_id but I can't figure out:

  1. how to count the distinct terms returned by the facet (a1 & a2 -> count=2)
  2. how to count results with an aggregate amount greater than X ("total" > 35 returns only a2 below)

POST /my_index/accounts/_search?search_type=count
{
"query" : {"match_all": {}},
"facets": {
"bens_facet": {
"terms_stats": {
"key_field": "account_id",
"value_field": "amount"
}
}
}
}

Produces this result ->

"facets": {
"bens_facet": {
"_type": "terms_stats",
"missing": 0,
"terms": [
{
"term": "a2",
"count": 3,
"total_count": 3,
"min": 5.230000019073486,
"max": 15.229999542236328,
"total": 35.68999910354614,
"mean": 11.896666367848715
},
{
"term": "a1",
"count": 1,
"total_count": 1,
"min": 10.229999542236328,
"max": 10.229999542236328,
"total": 10.229999542236328,
"mean": 10.229999542236328
}
]
}
}

Thank you for your time - Ben

Nope, you cannot access facet results from the search context. ... You could, however, execute the facet first and collect the values you need, then execute a search with the appropriate parameters injected into the script. You can provide a map of values to the script, and it can look up the required value in the map to match doc['occurrences'].value

-Zach


(dosmoney) #4

I have a similar issue accessing the facet results - i would like to count
the number of accounts that have an aggregated amount greater than X.

I can use a terms_stats facet to aggregate amount by account_id but I can't
figure out:

  1. how to count the distinct terms returned by the facet (a1 & a2 ->
    count=2)
  2. how to count results with an aggregate amount greater than X ("total" >
    35 returns only a2 below)

POST /my_index/accounts/_search?search_type=count
{
"query" : {"match_all": {}},
"facets": {
"bens_facet": {
"terms_stats": {
"key_field": "account_id",
"value_field": "amount"
}
}
}
}

Produces this result ->

"facets": {
"bens_facet": {
"_type": "terms_stats",
"missing": 0,
"terms": [
{
"term": "a2",
"count": 3,
"total_count": 3,
"min": 5.230000019073486,
"max": 15.229999542236328,
"total": 35.68999910354614,
"mean": 11.896666367848715
},
{
"term": "a1",
"count": 1,
"total_count": 1,
"min": 10.229999542236328,
"max": 10.229999542236328,
"total": 10.229999542236328,
"mean": 10.229999542236328
}
]
}
}

Does anyone have any ideas? Thank you for your time - Ben

On Friday, September 20, 2013 5:01:05 PM UTC-4, Zachary Tong wrote:

Nope, you cannot access facet results from the search context.

Even if you could, the results would not be what you expect. Facets are
collected alongside the search execution (e.g. document is scored, then any
relevant facets are accumulated, then execution moves to the next document,
etc etc). This is happening in parallel on all the shards of an index. So
the final facet results are not accurate until they have been merged and
reduced on the coordinating node. While the facet is executing, the counts
are going to be way off the final.

You could, however, execute the facet first and collect the values you
need, then execute a search with the appropriate parameters injected into
the script. You can provide a map of values to the script, and it can look
up the required value in the map to match doc['occurrences'].value

-Zach

On Friday, September 20, 2013 10:11:53 AM UTC-4, Pierre-Yves Vandenbussche
wrote:

Hi,

I would like to tune my scoring by adding a weight depending on the value
of a field "occurrences". As this field value is not bounded [0;infinite[,
I would like to use the max value for this field in the context of a query.
That's why I can not compute this score at the index time but rather have
to do it at query time.

The idea would be to have a score of the type: score = score +
(occurrences/max(occurrences))

By using a statistical facet I can get the max value for this field. But
don't know how (if even possible) to re-inject this value in the score.

Here is my latest simplified version of my *Unsuccessful *query:
{
"facets" : {
"stat1" : {
"statistical" : {
"field" : "occurrences"
}
}
},
"query":{
"custom_score" : {
"query" : {
"match" : {
"label": "personal"
}
},
"script" : "_score + (doc['occurrences'].value/
doc['facets.stat1.max'].value)"
}
}
}

Could anyone help me by telling me:

  • if this idea is doable,
  • if yes, if the approach I describe here using statistical facet +
    custom scoring is the right way to proceed
  • How to do it :slight_smile:

By advance Thank you.
Pierre-Yves.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Zachary Tong) #5

Hey Ben, answers inline:

  1. how to count the distinct terms returned by the facet (a1 & a2 ->

count=2)

Unfortunately, there isn't a good way to do this right now. The only way
to do it is request a facet size that is larger than the cardinality of
your field (which can be tricky if you don't know the cardinality...).
Once you return all the results in the facet, you can simply count the
length of the array.

It's not a great method, since high cardinality fields potentially stream
huge results back, but it's the only way right now. For the new
aggregation framework, we want to include "distinct" functionality, both
exact and approximate. Exact cardinality is an expensive metric, since you
need to maintain a hashmap of all values in memory. High cardinality
fields will be very memory hungry. Approximate cardinality is much easier
since we can rely on algorithms like HyperLogLog, and obtain estimates with
only 1-2% error.

  1. how to count results with an aggregate amount greater than X ("total" >
    35 returns only a2 below)

Going to disappoint you again - there isn't a way to do this currently
either. The solution is similar: request a facet with a size large enough
that some results are less than 35 (to use your example), and then manually
remove results that are less than 35 in your application. If your first
request doesn't cross the "transition point", perform another facet with a
larger size until you see the transition, then do the manual filtering.

This is another limitation we'd like to fix in the new aggregation
framework. A lot of these use-cases are actually the reason why the new
aggregation framework is being developed, to remove these kind of
limitations.

-Zach

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6