Custom Scoring Script Running Slow

Hello,

I'm attempting to run a custom scoring search that includes a script
that was ported over from a Solr boost function. Initial requests to
Elastic never return with a response. I must wait a long time after
sending the initial request before receiving a response from
subsequent request. To me is seems like maybe there is some type of
compilation and loading of the script that is causing that long wait
(unfamiliar of ES internals). Here is a subset of the entire script
that I want to run:

_score + 0.4*( -150.0*pow(max(doc[\u0027daysSinceRelease\u0027].value

  • -30000.0,0),0.000001)pow(max(-1826.0-doc[\u0027daysSinceRelease
    \u0027].value,0),0.000001) + -100.0
    pow(max(doc[\u0027daysSinceRelease
    \u0027].value - -1825.0,0),0.000001)pow(max(-1096.0-
    doc[\u0027daysSinceRelease\u0027].value,0),0.000001) + (-100.0+
    (doc[\u0027daysSinceRelease\u0027].value - -1095.0)
    (0.0 - -100.0)/
    (-548.0 - -1095.0))*pow(max(doc[\u0027daysSinceRelease\u0027].value -
    -1095.0,0),0.000001)*pow(max(-548.0-doc[\u0027daysSinceRelease
    \u0027].value,0),0.000001) )

What is causing the search to be so slow? BTW the "daysSinceRelease"
field is mapped to the integer datatype and I have 146k documents.
Also this search is super fast in Solr.

For a complex script like that, it might make sense to run it as a native script query: http://www.elasticsearch.org/guide/reference/modules/scripting.html. It will be much faster compared to anything you can do otherwise.

On Thursday, June 16, 2011 at 10:03 PM, thnguyen wrote:

Hello,

I'm attempting to run a custom scoring search that includes a script
that was ported over from a Solr boost function. Initial requests to
Elastic never return with a response. I must wait a long time after
sending the initial request before receiving a response from
subsequent request. To me is seems like maybe there is some type of
compilation and loading of the script that is causing that long wait
(unfamiliar of ES internals). Here is a subset of the entire script
that I want to run:

_score + 0.4*( -150.0*pow(max(doc[\u0027daysSinceRelease\u0027].value

  • -30000.0,0),0.000001)pow(max(-1826.0-doc[\u0027daysSinceRelease
    \u0027].value,0),0.000001) + -100.0
    pow(max(doc[\u0027daysSinceRelease
    \u0027].value - -1825.0,0),0.000001)pow(max(-1096.0-
    doc[\u0027daysSinceRelease\u0027].value,0),0.000001) + (-100.0+
    (doc[\u0027daysSinceRelease\u0027].value - -1095.0)
    (0.0 - -100.0)/
    (-548.0 - -1095.0))*pow(max(doc[\u0027daysSinceRelease\u0027].value -
    -1095.0,0),0.000001)*pow(max(-548.0-doc[\u0027daysSinceRelease
    \u0027].value,0),0.000001) )

What is causing the search to be so slow? BTW the "daysSinceRelease"
field is mapped to the integer datatype and I have 146k documents.
Also this search is super fast in Solr.

When extending AbstractSearchScript do all I need to do is place my
script in the score() method?

On Jun 16, 12:37 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

For a complex script like that, it might make sense to run it as a native script query:http://www.elasticsearch.org/guide/reference/modules/scripting.html. It will be much faster compared to anything you can do otherwise.

On Thursday, June 16, 2011 at 10:03 PM, thnguyen wrote:

Hello,

I'm attempting to run a custom scoring search that includes a script
that was ported over from a Solr boost function. Initial requests to
Elastic never return with a response. I must wait a long time after
sending the initial request before receiving a response from
subsequent request. To me is seems like maybe there is some type of
compilation and loading of the script that is causing that long wait
(unfamiliar of ES internals). Here is a subset of the entire script
that I want to run:

_score + 0.4*( -150.0*pow(max(doc[\u0027daysSinceRelease\u0027].value

  • -30000.0,0),0.000001)pow(max(-1826.0-doc[\u0027daysSinceRelease
    \u0027].value,0),0.000001) + -100.0
    pow(max(doc[\u0027daysSinceRelease
    \u0027].value - -1825.0,0),0.000001)pow(max(-1096.0-
    doc[\u0027daysSinceRelease\u0027].value,0),0.000001) + (-100.0+
    (doc[\u0027daysSinceRelease\u0027].value - -1095.0)
    (0.0 - -100.0)/
    (-548.0 - -1095.0))*pow(max(doc[\u0027daysSinceRelease\u0027].value -
    -1095.0,0),0.000001)*pow(max(-548.0-doc[\u0027daysSinceRelease
    \u0027].value,0),0.000001) )

What is causing the search to be so slow? BTW the "daysSinceRelease"
field is mapped to the integer datatype and I have 146k documents.
Also this search is super fast in Solr.

What you would want to do is extend AbstractFloatSearchScript, and override runAsFloat. Within the method, you can do:

long time = doc().numeric("field_name").getLongValue();
return // the formula you want to do goes here...

On Thursday, June 16, 2011 at 11:41 PM, thnguyen wrote:

When extending AbstractSearchScript do all I need to do is place my
script in the score() method?

On Jun 16, 12:37 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

For a complex script like that, it might make sense to run it as a native script query:http://www.elasticsearch.org/guide/reference/modules/scripting.html. It will be much faster compared to anything you can do otherwise.

On Thursday, June 16, 2011 at 10:03 PM, thnguyen wrote:

Hello,

I'm attempting to run a custom scoring search that includes a script
that was ported over from a Solr boost function. Initial requests to
Elastic never return with a response. I must wait a long time after
sending the initial request before receiving a response from
subsequent request. To me is seems like maybe there is some type of
compilation and loading of the script that is causing that long wait
(unfamiliar of ES internals). Here is a subset of the entire script
that I want to run:

_score + 0.4*( -150.0*pow(max(doc[\u0027daysSinceRelease\u0027].value

  • -30000.0,0),0.000001)pow(max(-1826.0-doc[\u0027daysSinceRelease
    \u0027].value,0),0.000001) + -100.0
    pow(max(doc[\u0027daysSinceRelease
    \u0027].value - -1825.0,0),0.000001)pow(max(-1096.0-
    doc[\u0027daysSinceRelease\u0027].value,0),0.000001) + (-100.0+
    (doc[\u0027daysSinceRelease\u0027].value - -1095.0)
    (0.0 - -100.0)/
    (-548.0 - -1095.0))*pow(max(doc[\u0027daysSinceRelease\u0027].value -
    -1095.0,0),0.000001)*pow(max(-548.0-doc[\u0027daysSinceRelease
    \u0027].value,0),0.000001) )

What is causing the search to be so slow? BTW the "daysSinceRelease"
field is mapped to the integer datatype and I have 146k documents.
Also this search is super fast in Solr.

Off, and the score you get by calling score(), for example:

long time = doc().numeric("field_name").getLongValue();
return score() * (...)

On Friday, June 17, 2011 at 12:59 AM, Shay Banon wrote:

What you would want to do is extend AbstractFloatSearchScript, and override runAsFloat. Within the method, you can do:

long time = doc().numeric("field_name").getLongValue();
return // the formula you want to do goes here...

On Thursday, June 16, 2011 at 11:41 PM, thnguyen wrote:

When extending AbstractSearchScript do all I need to do is place my
script in the score() method?

On Jun 16, 12:37 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

For a complex script like that, it might make sense to run it as a native script query:http://www.elasticsearch.org/guide/reference/modules/scripting.html. It will be much faster compared to anything you can do otherwise.

On Thursday, June 16, 2011 at 10:03 PM, thnguyen wrote:

Hello,

I'm attempting to run a custom scoring search that includes a script
that was ported over from a Solr boost function. Initial requests to
Elastic never return with a response. I must wait a long time after
sending the initial request before receiving a response from
subsequent request. To me is seems like maybe there is some type of
compilation and loading of the script that is causing that long wait
(unfamiliar of ES internals). Here is a subset of the entire script
that I want to run:

_score + 0.4*( -150.0*pow(max(doc[\u0027daysSinceRelease\u0027].value

  • -30000.0,0),0.000001)pow(max(-1826.0-doc[\u0027daysSinceRelease
    \u0027].value,0),0.000001) + -100.0
    pow(max(doc[\u0027daysSinceRelease
    \u0027].value - -1825.0,0),0.000001)pow(max(-1096.0-
    doc[\u0027daysSinceRelease\u0027].value,0),0.000001) + (-100.0+
    (doc[\u0027daysSinceRelease\u0027].value - -1095.0)
    (0.0 - -100.0)/
    (-548.0 - -1095.0))*pow(max(doc[\u0027daysSinceRelease\u0027].value -
    -1095.0,0),0.000001)*pow(max(-548.0-doc[\u0027daysSinceRelease
    \u0027].value,0),0.000001) )

What is causing the search to be so slow? BTW the "daysSinceRelease"
field is mapped to the integer datatype and I have 146k documents.
Also this search is super fast in Solr.

Forgive me but I'm confused. Following your advise on using
AbstractFloatSearchScript I have created the following class:
https://gist.github.com/1030597. You then mentioned in your next
reply the score() method. In my runAsFloat() method do I need to add
score() to the value of my function? Does Elastic automatically call
runAsFloat() to get the score when I run the following search query (I
assume it's correct)?:

{
"query":{
"custom_score":{
"query":{
"query_string":{
"default_field":"title",
"query":"grand theft auto"
}
},
"lang":"native",
"script":"CustomScript "
}
}
}

On Jun 16, 3:00 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Off, and the score you get by calling score(), for example:

long time = doc().numeric("field_name").getLongValue();
return score() * (...)

On Friday, June 17, 2011 at 12:59 AM, Shay Banon wrote:

What you would want to do is extend AbstractFloatSearchScript, and override runAsFloat. Within the method, you can do:

long time = doc().numeric("field_name").getLongValue();
return // the formula you want to do goes here...

On Thursday, June 16, 2011 at 11:41 PM, thnguyen wrote:

When extending AbstractSearchScript do all I need to do is place my
script in the score() method?

On Jun 16, 12:37 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

For a complex script like that, it might make sense to run it as a native script query:http://www.elasticsearch.org/guide/reference/modules/scripting.html. It will be much faster compared to anything you can do otherwise.

On Thursday, June 16, 2011 at 10:03 PM, thnguyen wrote:

Hello,

I'm attempting to run a custom scoring search that includes a script
that was ported over from a Solr boost function. Initial requests to
Elastic never return with a response. I must wait a long time after
sending the initial request before receiving a response from
subsequent request. To me is seems like maybe there is some type of
compilation and loading of the script that is causing that long wait
(unfamiliar of ES internals). Here is a subset of the entire script
that I want to run:

_score + 0.4*( -150.0*pow(max(doc[\u0027daysSinceRelease\u0027].value

  • -30000.0,0),0.000001)pow(max(-1826.0-doc[\u0027daysSinceRelease
    \u0027].value,0),0.000001) + -100.0
    pow(max(doc[\u0027daysSinceRelease
    \u0027].value - -1825.0,0),0.000001)pow(max(-1096.0-
    doc[\u0027daysSinceRelease\u0027].value,0),0.000001) + (-100.0+
    (doc[\u0027daysSinceRelease\u0027].value - -1095.0)
    (0.0 - -100.0)/
    (-548.0 - -1095.0))*pow(max(doc[\u0027daysSinceRelease\u0027].value -
    -1095.0,0),0.000001)*pow(max(-548.0-doc[\u0027daysSinceRelease
    \u0027].value,0),0.000001) )

What is causing the search to be so slow? BTW the "daysSinceRelease"
field is mapped to the integer datatype and I have 146k documents.
Also this search is super fast in Solr.

It uses the score you return in your function. If you want to use the current calculated score (the one computed based on matching the query, as you did in your script, then you can use the score() function to get it).

On Friday, June 17, 2011 at 3:02 AM, thnguyen wrote:

Forgive me but I'm confused. Following your advise on using
AbstractFloatSearchScript I have created the following class:
https://gist.github.com/1030597. You then mentioned in your next
reply the score() method. In my runAsFloat() method do I need to add
score() to the value of my function? Does Elastic automatically call
runAsFloat() to get the score when I run the following search query (I
assume it's correct)?:

{
"query":{
"custom_score":{
"query":{
"query_string":{
"default_field":"title",
"query":"grand theft auto"
}
},
"lang":"native",
"script":"CustomScript "
}
}
}

On Jun 16, 3:00 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

Off, and the score you get by calling score(), for example:

long time = doc().numeric("field_name").getLongValue();
return score() * (...)

On Friday, June 17, 2011 at 12:59 AM, Shay Banon wrote:

What you would want to do is extend AbstractFloatSearchScript, and override runAsFloat. Within the method, you can do:

long time = doc().numeric("field_name").getLongValue();
return // the formula you want to do goes here...

On Thursday, June 16, 2011 at 11:41 PM, thnguyen wrote:

When extending AbstractSearchScript do all I need to do is place my
script in the score() method?

On Jun 16, 12:37 pm, Shay Banon <shay.ba...@elasticsearch.com (http://elasticsearch.com)> wrote:

For a complex script like that, it might make sense to run it as a native script query:http://www.elasticsearch.org/guide/reference/modules/scripting.html. It will be much faster compared to anything you can do otherwise.

On Thursday, June 16, 2011 at 10:03 PM, thnguyen wrote:

Hello,

I'm attempting to run a custom scoring search that includes a script
that was ported over from a Solr boost function. Initial requests to
Elastic never return with a response. I must wait a long time after
sending the initial request before receiving a response from
subsequent request. To me is seems like maybe there is some type of
compilation and loading of the script that is causing that long wait
(unfamiliar of ES internals). Here is a subset of the entire script
that I want to run:

_score + 0.4*( -150.0*pow(max(doc[\u0027daysSinceRelease\u0027].value

  • -30000.0,0),0.000001)pow(max(-1826.0-doc[\u0027daysSinceRelease
    \u0027].value,0),0.000001) + -100.0
    pow(max(doc[\u0027daysSinceRelease
    \u0027].value - -1825.0,0),0.000001)pow(max(-1096.0-
    doc[\u0027daysSinceRelease\u0027].value,0),0.000001) + (-100.0+
    (doc[\u0027daysSinceRelease\u0027].value - -1095.0)
    (0.0 - -100.0)/
    (-548.0 - -1095.0))*pow(max(doc[\u0027daysSinceRelease\u0027].value -
    -1095.0,0),0.000001)*pow(max(-548.0-doc[\u0027daysSinceRelease
    \u0027].value,0),0.000001) )

What is causing the search to be so slow? BTW the "daysSinceRelease"
field is mapped to the integer datatype and I have 146k documents.
Also this search is super fast in Solr.

Would marking the fields used in the custom score script as "store" : "yes" in the mapping help?
Would ES in this case access to the data faster?