Recency: Boost by Date


(Karussell) #1

Hi,

I would like to boost recent documents. The same question was asked
but not answered here [1].
I would like to execute the script config/scripts/queryboost.mvel with
the following content:

newScore = _score;
newScore *= 800 /(10.0e-9 * (time - doc['dt'].value) + 1);

I'm now querying via:
qb = customScoreQuery(qb).script("queryboost").lang("mvel");

Is the line newScore = _score; correct to get the original score for
the doc?
How can I tell ES to use newScore to boost the doc or is the last line
of the script already taken as new score?
Can I "hot-reload" the changed script or will I need to close+start
the node?

Kind regards,
Peter.

[1]
http://elasticsearch-users.115913.n3.nabble.com/Recency-td727682.html


(ppearcy) #2

Hi Peter,
Script fields are one approach. You'd want to add a "return
newScore;" for that to work.

Alternatively, and probably faster the suggestion in this thread to
boost a date range works:
http://elasticsearch-users.115913.n3.nabble.com/Boost-recent-documents-td2126107.html#a2126317

I don't know how reloading of scripts works. I'd guess a rolling
restart is needed, but test it.

Regards,
Paul

On Jan 15, 4:39 pm, Karussell tableyourt...@googlemail.com wrote:

Hi,

I would like to boost recent documents. The same question was asked
but not answered here [1].
I would like to execute the script config/scripts/queryboost.mvel with
the following content:

newScore = _score;
newScore *= 800 /(10.0e-9 * (time - doc['dt'].value) + 1);

I'm now querying via:
qb = customScoreQuery(qb).script("queryboost").lang("mvel");

Is the line newScore = _score; correct to get the original score for
the doc?
How can I tell ES to use newScore to boost the doc or is the last line
of the script already taken as new score?
Can I "hot-reload" the changed script or will I need to close+start
the node?

Kind regards,
Peter.

[1]http://elasticsearch-users.115913.n3.nabble.com/Recency-td727682.html


(Shay Banon) #3

If you send the script as part of the query request (and not configure it as part of the node), then sending a different script will simply use that one.

Debugging scripts is a bit annoying. I suggest you start with a simple script, and check in the score response (/ explanation) that you get what you expect and then start playing with making more complex scripts.
On Sunday, January 16, 2011 at 3:25 AM, Paul wrote:

Hi Peter,
Script fields are one approach. You'd want to add a "return
newScore;" for that to work.

Alternatively, and probably faster the suggestion in this thread to
boost a date range works:
http://elasticsearch-users.115913.n3.nabble.com/Boost-recent-documents-td2126107.html#a2126317

I don't know how reloading of scripts works. I'd guess a rolling
restart is needed, but test it.

Regards,
Paul

On Jan 15, 4:39 pm, Karussell tableyourt...@googlemail.com wrote:

Hi,

I would like to boost recent documents. The same question was asked
but not answered here [1].
I would like to execute the script config/scripts/queryboost.mvel with
the following content:

newScore = _score;
newScore *= 800 /(10.0e-9 * (time - doc['dt'].value) + 1);

I'm now querying via:
qb = customScoreQuery(qb).script("queryboost").lang("mvel");

Is the line newScore = _score; correct to get the original score for
the doc?
How can I tell ES to use newScore to boost the doc or is the last line
of the script already taken as new score?
Can I "hot-reload" the changed script or will I need to close+start
the node?

Kind regards,
Peter.

[1]http://elasticsearch-users.115913.n3.nabble.com/Recency-td727682.html


(Karussell) #4

Hi Paul, hi Shay,

Alternatively, and probably faster the suggestion in this thread to boost a date range works:
http://elasticsearch-users.115913.n3.nabble.com/Boost-recent-document...

Thanks for the hint, Paul. But I'm more searching for a recency-
boosting which I can better mix with other boostings.

My use case is that I have tweets (which have retweetNumber,
date, ...) and I don't want to simply sort against retweetNumber or
latest tweets ... (see jetwick.com for the solr implemention)

On 16 Jan., 10:58, Shay Banon shay.ba...@elasticsearch.com wrote:
If you send the script as part of the query request (and not configure it as part of the node), then sending a different script will simply use that one.

ok. Is this possible via the Java API too to send the script in the
query?

Debugging scripts is a bit annoying.

ah ok, this would have been the next question :wink:

Regards,
Peter.


(Karussell) #5

ok. got it via:

long time = new Date().getTime();
qb = customScoreQuery(qb).script("return _score * 800 /(10.0e-9 *
("+time+" - doc['dt'].value) + 1);");

The problem was that the statement time wasn't found in mvel (!?) ...

But now the query takes > 7sec for only 500k tweets. What can I do?

Also when I'm using the precompiled script via:

qb =
customScoreQuery(qb).script("queryboost").lang("mvel").param("mynow",
new Date().getTime());

Regards,
Peter

On 16 Jan., 21:50, Karussell tableyourt...@googlemail.com wrote:

Hi Paul, hi Shay,

Alternatively, and probably faster the suggestion in this thread to boost a date range works:
http://elasticsearch-users.115913.n3.nabble.com/Boost-recent-document...

Thanks for the hint, Paul. But I'm more searching for a recency-
boosting which I can better mix with other boostings.

My use case is that I have tweets (which have retweetNumber,
date, ...) and I don't want to simply sort against retweetNumber or
latest tweets ... (see jetwick.com for the solr implemention)

On 16 Jan., 10:58, Shay Banon shay.ba...@elasticsearch.com wrote:
If you send the script as part of the query request (and not configure it as part of the node), then sending a different script will simply use that one.

ok. Is this possible via the Java API too to send the script in the
query?

Debugging scripts is a bit annoying.

ah ok, this would have been the next question :wink:

Regards,
Peter.


(Shay Banon) #6

How long does the query takes without the custom score query? Scripts are always compiled, so you can send the same script string over and over again, and if it does not change, the compiled version of it will be used (and just change parameters).
On Sunday, January 16, 2011 at 11:34 PM, Karussell wrote:

ok. got it via:

long time = new Date().getTime();
qb = customScoreQuery(qb).script("return _score * 800 /(10.0e-9 *
("+time+" - doc['dt'].value) + 1);");

The problem was that the statement time wasn't found in mvel (!?) ...

But now the query takes > 7sec for only 500k tweets. What can I do?

Also when I'm using the precompiled script via:

qb =
customScoreQuery(qb).script("queryboost").lang("mvel").param("mynow",
new Date().getTime());

Regards,
Peter

On 16 Jan., 21:50, Karussell tableyourt...@googlemail.com wrote:

Hi Paul, hi Shay,

Alternatively, and probably faster the suggestion in this thread to boost a date range works:
http://elasticsearch-users.115913.n3.nabble.com/Boost-recent-document...

Thanks for the hint, Paul. But I'm more searching for a recency-
boosting which I can better mix with other boostings.

My use case is that I have tweets (which have retweetNumber,
date, ...) and I don't want to simply sort against retweetNumber or
latest tweets ... (see jetwick.com for the solr implemention)

On 16 Jan., 10:58, Shay Banon shay.ba...@elasticsearch.com wrote:
If you send the script as part of the query request (and not configure it as part of the node), then sending a different script will simply use that one.

ok. Is this possible via the Java API too to send the script in the
query?

Debugging scripts is a bit annoying.

ah ok, this would have been the next question :wink:

Regards,
Peter.


(Karussell) #7

Without the score query it takes ~0.3sec (awesome time because lots of
facets. even when sorting ES is nearly that fast for the first query:
<1sec and that without tuning!)

The query takes over 7 sec even for the second query (!?), also when
I'm using the same 'now' in Java via:

long ONE_HOUR= 60 * 3600 * 1000L;
time = (time / ONE_HOUR) * ONE_HOUR;
qb = customScoreQuery(qb).script("return _score * 800 /(10.0e-9 *
("+time+" - doc['dt'].value) + 1);");

Shouldn't caching come into the game then?

Regards,
Peter.

On 17 Jan., 10:01, Shay Banon shay.ba...@elasticsearch.com wrote:

How long does the query takes without the custom score query? Scripts are always compiled, so you can send the same script string over and over again, and if it does not change, the compiled version of it will be used (and just change parameters).

On Sunday, January 16, 2011 at 11:34 PM, Karussell wrote:

ok. got it via:

long time = new Date().getTime();
qb = customScoreQuery(qb).script("return _score * 800 /(10.0e-9 *
("+time+" - doc['dt'].value) + 1);");

The problem was that the statement time wasn't found in mvel (!?) ...

But now the query takes > 7sec for only 500k tweets. What can I do?

Also when I'm using the precompiled script via:

qb =
customScoreQuery(qb).script("queryboost").lang("mvel").param("mynow",
new Date().getTime());

Regards,
Peter

On 16 Jan., 21:50, Karussell tableyourt...@googlemail.com wrote:

Hi Paul, hi Shay,

Alternatively, and probably faster the suggestion in this thread to boost a date range works:
http://elasticsearch-users.115913.n3.nabble.com/Boost-recent-document...

Thanks for the hint, Paul. But I'm more searching for a recency-
boosting which I can better mix with other boostings.

My use case is that I have tweets (which have retweetNumber,
date, ...) and I don't want to simply sort against retweetNumber or
latest tweets ... (see jetwick.com for the solr implemention)

On 16 Jan., 10:58, Shay Banon shay.ba...@elasticsearch.com wrote:
If you send the script as part of the query request (and not configure it as part of the node), then sending a different script will simply use that one.

ok. Is this possible via the Java API too to send the script in the
query?

Debugging scripts is a bit annoying.

ah ok, this would have been the next question :wink:

Regards,
Peter.


(Shay Banon) #8

The script string should not change between executions. Just use something like currTime in the script, and pass a parameter called currTime with System.currentTimeInMillis().

The script will still be slower, since it needs to be evaluated for each hit it matches on. I am thinking of adding the ability to add precompiled Java based "scripts" for faster execution.
On Monday, January 17, 2011 at 1:58 PM, Karussell wrote:

Without the score query it takes ~0.3sec (awesome time because lots of
facets. even when sorting ES is nearly that fast for the first query:
<1sec and that without tuning!)

The query takes over 7 sec even for the second query (!?), also when
I'm using the same 'now' in Java via:

long ONE_HOUR= 60 * 3600 * 1000L;
time = (time / ONE_HOUR) * ONE_HOUR;
qb = customScoreQuery(qb).script("return _score * 800 /(10.0e-9 *
("+time+" - doc['dt'].value) + 1);");

Shouldn't caching come into the game then?

Regards,
Peter.

On 17 Jan., 10:01, Shay Banon shay.ba...@elasticsearch.com wrote:

How long does the query takes without the custom score query? Scripts are always compiled, so you can send the same script string over and over again, and if it does not change, the compiled version of it will be used (and just change parameters).

On Sunday, January 16, 2011 at 11:34 PM, Karussell wrote:

ok. got it via:

long time = new Date().getTime();
qb = customScoreQuery(qb).script("return _score * 800 /(10.0e-9 *
("+time+" - doc['dt'].value) + 1);");

The problem was that the statement time wasn't found in mvel (!?) ...

But now the query takes > 7sec for only 500k tweets. What can I do?

Also when I'm using the precompiled script via:

qb =
customScoreQuery(qb).script("queryboost").lang("mvel").param("mynow",
new Date().getTime());

Regards,
Peter

On 16 Jan., 21:50, Karussell tableyourt...@googlemail.com wrote:

Hi Paul, hi Shay,

Alternatively, and probably faster the suggestion in this thread to boost a date range works:
http://elasticsearch-users.115913.n3.nabble.com/Boost-recent-document...

Thanks for the hint, Paul. But I'm more searching for a recency-
boosting which I can better mix with other boostings.

My use case is that I have tweets (which have retweetNumber,
date, ...) and I don't want to simply sort against retweetNumber or
latest tweets ... (see jetwick.com for the solr implemention)

On 16 Jan., 10:58, Shay Banon shay.ba...@elasticsearch.com wrote:
If you send the script as part of the query request (and not configure it as part of the node), then sending a different script will simply use that one.

ok. Is this possible via the Java API too to send the script in the
query?

Debugging scripts is a bit annoying.

ah ok, this would have been the next question :wink:

Regards,
Peter.


(Karussell) #9

when I'm profiling ES I can see that
org.elasticsearch.script.search.SearchScript.execute(int, Map) takes
92 % of the CPU time.

This method then calls
org.elasticsearch.common.mvel2.compiler.CompiledExpression.getValue(Object,
VariableResolverFactory)

and there the performance splits to 77%:
org.elasticsearch.common.mvel2.MVELRuntime.execute(boolean,
CompiledExpression, Object, VariableResolverFactory)

(where
org.elasticsearch.common.mvel2.ast.BinaryOperation.getReducedValueAccelerated(Object,
Object, VariableResolverFactory) is the point)

and 14%
org.elasticsearch.common.mvel2.integration.impl.ClassImportResolverFactory.(ParserConfiguration,
VariableResolverFactory)

On 17 Jan., 12:58, Karussell tableyourt...@googlemail.com wrote:

Without the score query it takes ~0.3sec (awesome time because lots of
facets. even when sorting ES is nearly that fast for the first query:
<1sec and that without tuning!)

The query takes over 7 sec even for the second query (!?), also when
I'm using the same 'now' in Java via:

long ONE_HOUR= 60 * 3600 * 1000L;
time = (time / ONE_HOUR) * ONE_HOUR;
qb = customScoreQuery(qb).script("return _score * 800 /(10.0e-9 *
("+time+" - doc['dt'].value) + 1);");

Shouldn't caching come into the game then?

Regards,
Peter.

On 17 Jan., 10:01, Shay Banon shay.ba...@elasticsearch.com wrote:

How long does the query takes without the custom score query? Scripts are always compiled, so you can send the same script string over and over again, and if it does not change, the compiled version of it will be used (and just change parameters).

On Sunday, January 16, 2011 at 11:34 PM, Karussell wrote:

ok. got it via:

long time = new Date().getTime();
qb = customScoreQuery(qb).script("return _score * 800 /(10.0e-9 *
("+time+" - doc['dt'].value) + 1);");

The problem was that the statement time wasn't found in mvel (!?) ...

But now the query takes > 7sec for only 500k tweets. What can I do?

Also when I'm using the precompiled script via:

qb =
customScoreQuery(qb).script("queryboost").lang("mvel").param("mynow",
new Date().getTime());

Regards,
Peter

On 16 Jan., 21:50, Karussell tableyourt...@googlemail.com wrote:

Hi Paul, hi Shay,

Alternatively, and probably faster the suggestion in this thread to boost a date range works:
http://elasticsearch-users.115913.n3.nabble.com/Boost-recent-document...

Thanks for the hint, Paul. But I'm more searching for a recency-
boosting which I can better mix with other boostings.

My use case is that I have tweets (which have retweetNumber,
date, ...) and I don't want to simply sort against retweetNumber or
latest tweets ... (see jetwick.com for the solr implemention)

On 16 Jan., 10:58, Shay Banon shay.ba...@elasticsearch.com wrote:
If you send the script as part of the query request (and not configure it as part of the node), then sending a different script will simply use that one.

ok. Is this possible via the Java API too to send the script in the
query?

Debugging scripts is a bit annoying.

ah ok, this would have been the next question :wink:

Regards,
Peter.


(Shay Banon) #10

Yea, thats what I thought, it spends time in the script execution. I can check if there is one way to speed up mvel to execute faster, another option is maybe try groovy.

The long term is to allow for fast evaluation of common formulas. One way is to add a built in time base elevation scoring query. More broadly, an option to plug a Java based script evaluator, or even create an AST for (mostly) numeric based evaluation.

-shay.banon
On Monday, January 17, 2011 at 2:20 PM, Karussell wrote:

when I'm profiling ES I can see that
org.elasticsearch.script.search.SearchScript.execute(int, Map) takes
92 % of the CPU time.

This method then calls
org.elasticsearch.common.mvel2.compiler.CompiledExpression.getValue(Object,
VariableResolverFactory)

and there the performance splits to 77%:
org.elasticsearch.common.mvel2.MVELRuntime.execute(boolean,
CompiledExpression, Object, VariableResolverFactory)

(where
org.elasticsearch.common.mvel2.ast.BinaryOperation.getReducedValueAccelerated(Object,
Object, VariableResolverFactory) is the point)

and 14%
org.elasticsearch.common.mvel2.integration.impl.ClassImportResolverFactory.(ParserConfiguration,
VariableResolverFactory)

On 17 Jan., 12:58, Karussell tableyourt...@googlemail.com wrote:

Without the score query it takes ~0.3sec (awesome time because lots of
facets. even when sorting ES is nearly that fast for the first query:
<1sec and that without tuning!)

The query takes over 7 sec even for the second query (!?), also when
I'm using the same 'now' in Java via:

long ONE_HOUR= 60 * 3600 * 1000L;
time = (time / ONE_HOUR) * ONE_HOUR;
qb = customScoreQuery(qb).script("return _score * 800 /(10.0e-9 *
("+time+" - doc['dt'].value) + 1);");

Shouldn't caching come into the game then?

Regards,
Peter.

On 17 Jan., 10:01, Shay Banon shay.ba...@elasticsearch.com wrote:

How long does the query takes without the custom score query? Scripts are always compiled, so you can send the same script string over and over again, and if it does not change, the compiled version of it will be used (and just change parameters).

On Sunday, January 16, 2011 at 11:34 PM, Karussell wrote:

ok. got it via:

long time = new Date().getTime();
qb = customScoreQuery(qb).script("return _score * 800 /(10.0e-9 *
("+time+" - doc['dt'].value) + 1);");

The problem was that the statement time wasn't found in mvel (!?) ...

But now the query takes > 7sec for only 500k tweets. What can I do?

Also when I'm using the precompiled script via:

qb =
customScoreQuery(qb).script("queryboost").lang("mvel").param("mynow",
new Date().getTime());

Regards,
Peter

On 16 Jan., 21:50, Karussell tableyourt...@googlemail.com wrote:

Hi Paul, hi Shay,

Alternatively, and probably faster the suggestion in this thread to boost a date range works:
http://elasticsearch-users.115913.n3.nabble.com/Boost-recent-document...

Thanks for the hint, Paul. But I'm more searching for a recency-
boosting which I can better mix with other boostings.

My use case is that I have tweets (which have retweetNumber,
date, ...) and I don't want to simply sort against retweetNumber or
latest tweets ... (see jetwick.com for the solr implemention)

On 16 Jan., 10:58, Shay Banon shay.ba...@elasticsearch.com wrote:
If you send the script as part of the query request (and not configure it as part of the node), then sending a different script will simply use that one.

ok. Is this possible via the Java API too to send the script in the
query?

Debugging scripts is a bit annoying.

ah ok, this would have been the next question :wink:

Regards,
Peter.


(Karussell) #11

maybe try groovy

I'll try that

The long term is to allow for fast evaluation of common formulas.

solr choose the following formulars ... no, please use different
names. some are odd like map, recip etc :wink:

More broadly, an option to plug a Java based script evaluator

for my purposes a plugin as native jar would be really sufficient: I
can test via 'slow' script and compile for fast usage ...

even create an AST for (mostly) numeric based evaluation.

yeah, would this be too complicated? don't we need only *,/,+,-,^,(,)
and normal math ops like sin, cos, ... ?

I think there should be already something in the web. I'll take a
look.

On 17 Jan., 17:48, Shay Banon shay.ba...@elasticsearch.com wrote:

Yea, thats what I thought, it spends time in the script execution. I can check if there is one way to speed up mvel to execute faster, another option is maybe try groovy.

The long term is to allow for fast evaluation of common formulas. One way is to add a built in time base elevation scoring query. More broadly, an option to plug a Java based script evaluator, or even create an AST for (mostly) numeric based evaluation.

-shay.banon

On Monday, January 17, 2011 at 2:20 PM, Karussell wrote:

when I'm profiling ES I can see that
org.elasticsearch.script.search.SearchScript.execute(int, Map) takes
92 % of the CPU time.

This method then calls
org.elasticsearch.common.mvel2.compiler.CompiledExpression.getValue(Object,
VariableResolverFactory)

and there the performance splits to 77%:
org.elasticsearch.common.mvel2.MVELRuntime.execute(boolean,
CompiledExpression, Object, VariableResolverFactory)

(where
org.elasticsearch.common.mvel2.ast.BinaryOperation.getReducedValueAccelerated(Object,
Object, VariableResolverFactory) is the point)

and 14%
org.elasticsearch.common.mvel2.integration.impl.ClassImportResolverFactory.(ParserConfiguration,
VariableResolverFactory)

On 17 Jan., 12:58, Karussell tableyourt...@googlemail.com wrote:

Without the score query it takes ~0.3sec (awesome time because lots of
facets. even when sorting ES is nearly that fast for the first query:
<1sec and that without tuning!)

The query takes over 7 sec even for the second query (!?), also when
I'm using the same 'now' in Java via:

long ONE_HOUR= 60 * 3600 * 1000L;
time = (time / ONE_HOUR) * ONE_HOUR;
qb = customScoreQuery(qb).script("return _score * 800 /(10.0e-9 *
("+time+" - doc['dt'].value) + 1);");

Shouldn't caching come into the game then?

Regards,
Peter.

On 17 Jan., 10:01, Shay Banon shay.ba...@elasticsearch.com wrote:

How long does the query takes without the custom score query? Scripts are always compiled, so you can send the same script string over and over again, and if it does not change, the compiled version of it will be used (and just change parameters).

On Sunday, January 16, 2011 at 11:34 PM, Karussell wrote:

ok. got it via:

long time = new Date().getTime();
qb = customScoreQuery(qb).script("return _score * 800 /(10.0e-9 *
("+time+" - doc['dt'].value) + 1);");

The problem was that the statement time wasn't found in mvel (!?) ...

But now the query takes > 7sec for only 500k tweets. What can I do?

Also when I'm using the precompiled script via:

qb =
customScoreQuery(qb).script("queryboost").lang("mvel").param("mynow",
new Date().getTime());

Regards,
Peter

On 16 Jan., 21:50, Karussell tableyourt...@googlemail.com wrote:

Hi Paul, hi Shay,

Alternatively, and probably faster the suggestion in this thread to boost a date range works:
http://elasticsearch-users.115913.n3.nabble.com/Boost-recent-document...

Thanks for the hint, Paul. But I'm more searching for a recency-
boosting which I can better mix with other boostings.

My use case is that I have tweets (which have retweetNumber,
date, ...) and I don't want to simply sort against retweetNumber or
latest tweets ... (see jetwick.com for the solr implemention)

On 16 Jan., 10:58, Shay Banon shay.ba...@elasticsearch.com wrote:
If you send the script as part of the query request (and not configure it as part of the node), then sending a different script will simply use that one.

ok. Is this possible via the Java API too to send the script in the
query?

Debugging scripts is a bit annoying.

ah ok, this would have been the next question :wink:

Regards,
Peter.


(Karussell) #12

Found the following:

http://jmep.tigris.org/

http://code.google.com/p/symja/wiki/MathExpressionParser

http://www.softwaremonkey.org/Code/MathEval

http://objecthunter.congrace.de/tinybo/blog/articles/86

via antlr grammar:

https://supportweb.cs.bham.ac.uk/documentation/tutorials/docsystem/build/tutorials/antlr/antlr.html

(or another parser via grammar https://sites.google.com/site/drjohnbmatthews/enumerated-functions)

(GPLed: https://sourceforge.net/projects/jep/ )


(Karussell) #13

Using javascript the query now executes in under 0.8sec!!
(when was the last time we used js to improve performance :wink: ? ok, its
only an implementation ... but: nice!)

BTW1: using groovy I got:

Parse Failure [Failed to parse source [na]]]; nested:
ElasticSearchIllegalArgumentException[script_lang not supported
[groovy]];

I added the groovy plugin jar like I added the js plugin jar (via
maven)

BTW2: I had to restart the node to change the language engine. Would
you mind to add this into the docs? Otherwise one thinks that one is
using the new language for the script but is still using the first one!


(Shay Banon) #14

Ha :). Well, those long nights doing deep integration with Rhino were worth it :). I think Groovy might actually be faster (I fixed the problem you mention in master). All three languages, javascript (Rhino), python (jython), and groovy have very low level integration that make them really fast to execute (sadly, haven't found the same optimizations possible with jruby).

Though, its strange that mvel is taking this long compared to rhino, sure, it might not get compiled "as much" into bytecode, but still. I will look into it.
On Tuesday, January 18, 2011 at 12:44 AM, Karussell wrote:

Using javascript the query now executes in under 0.8sec!!
(when was the last time we used js to improve performance :wink: ? ok, its
only an implementation ... but: nice!)

BTW1: using groovy I got:

Parse Failure [Failed to parse source [na]]]; nested:
ElasticSearchIllegalArgumentException[script_lang not supported
[groovy]];

I added the groovy plugin jar like I added the js plugin jar (via
maven)

BTW2: I had to restart the node to change the language engine. Would
you mind to add this into the docs? Otherwise one thinks that one is
using the new language for the script but is still using the first one!


(Shay Banon) #15

I still wonder if its worth it to have an optimized script engine for numeric based calcs, I will hack with it a bit and see if it make sense. "You can never shave enough milliseconds" starts to dominate my life too much... :slight_smile:
On Tuesday, January 18, 2011 at 1:29 AM, Shay Banon wrote:

Ha :). Well, those long nights doing deep integration with Rhino were worth it :). I think Groovy might actually be faster (I fixed the problem you mention in master). All three languages, javascript (Rhino), python (jython), and groovy have very low level integration that make them really fast to execute (sadly, haven't found the same optimizations possible with jruby).

Though, its strange that mvel is taking this long compared to rhino, sure, it might not get compiled "as much" into bytecode, but still. I will look into it.
On Tuesday, January 18, 2011 at 12:44 AM, Karussell wrote:

Using javascript the query now executes in under 0.8sec!!
(when was the last time we used js to improve performance :wink: ? ok, its
only an implementation ... but: nice!)

BTW1: using groovy I got:

Parse Failure [Failed to parse source [na]]]; nested:
ElasticSearchIllegalArgumentException[script_lang not supported
[groovy]];

I added the groovy plugin jar like I added the js plugin jar (via
maven)

BTW2: I had to restart the node to change the language engine. Would
you mind to add this into the docs? Otherwise one thinks that one is
using the new language for the script but is still using the first one!


(Karussell) #16

I played with the master where you fixed the groovy thing. Thanks for
that btw :slight_smile: !

Now (only) 353k tweets:

  1. try mvel with string insertation of mynow/time => ~5s query time

now all experiments are done with mynow as paramter (instead string
insertation)
2. mvel => 5.2, 4.9, 5.1, 4.8
3. js => 1.0, 0.5, 0.6, 0.5
4. groovy => 1.1, 1.0, 0.8, 0.8
5. python => 1.5, 0.9, 0.9, 0.9

so, really js seems to be the fastest in my case :slight_smile:

mvel seems to be slower the more 'complex' my equation is.

unlike js, where it takes nearly always around 0.5s.

"You can never shave enough milliseconds" starts to
dominate my life too much... :slight_smile:

I wonder if there is an alternative approach to improve speed.
Couldn't be the document id's be cached in someway?

Or even detect which variables are used (in my case _score and
doc['dt'].value) and cache the results of the equation?

Another idea would be to calculate the result or the _score not too
precise (e.g. cut after 3 decimal points) and then use caching ...

Regards,
Peter.

On 18 Jan., 00:31, Shay Banon shay.ba...@elasticsearch.com wrote:

I still wonder if its worth it to have an optimized script engine for numeric based calcs, I will hack with it a bit and see if it make sense. "You can never shave enough milliseconds" starts to dominate my life too much... :slight_smile:

On Tuesday, January 18, 2011 at 1:29 AM, Shay Banon wrote:

Ha :). Well, those long nights doing deep integration with Rhino were worth it :). I think Groovy might actually be faster (I fixed the problem you mention in master). All three languages, javascript (Rhino), python (jython), and groovy have very low level integration that make them really fast to execute (sadly, haven't found the same optimizations possible with jruby).

Though, its strange that mvel is taking this long compared to rhino, sure, it might not get compiled "as much" into bytecode, but still. I will look into it.
On Tuesday, January 18, 2011 at 12:44 AM, Karussell wrote:

Using javascript the query now executes in under 0.8sec!!
(when was the last time we used js to improve performance :wink: ? ok, its
only an implementation ... but: nice!)

BTW1: using groovy I got:

Parse Failure [Failed to parse source [na]]]; nested:
ElasticSearchIllegalArgumentException[script_lang not supported
[groovy]];

I added the groovy plugin jar like I added the js plugin jar (via
maven)

BTW2: I had to restart the node to change the language engine. Would
you mind to add this into the docs? Otherwise one thinks that one is
using the new language for the script but is still using the first one!


(Shay Banon) #17

On Wednesday, January 19, 2011 at 5:08 PM, Karussell wrote:

I played with the master where you fixed the groovy thing. Thanks for
that btw :slight_smile: !

Now (only) 353k tweets:

  1. try mvel with string insertation of mynow/time => ~5s query time

now all experiments are done with mynow as paramter (instead string
insertation)
2. mvel => 5.2, 4.9, 5.1, 4.8
3. js => 1.0, 0.5, 0.6, 0.5
4. groovy => 1.1, 1.0, 0.8, 0.8
5. python => 1.5, 0.9, 0.9, 0.9

so, really js seems to be the fastest in my case :slight_smile:

Interesting... . I need to run some tests and see where time is spent on mvel, promised Brock I would do that :slight_smile:

mvel seems to be slower the more 'complex' my equation is.

unlike js, where it takes nearly always around 0.5s.

"You can never shave enough milliseconds" starts to
dominate my life too much... :slight_smile:

I wonder if there is an alternative approach to improve speed.
Couldn't be the document id's be cached in someway?

Or even detect which variables are used (in my case _score and
doc['dt'].value) and cache the results of the equation?

Another idea would be to calculate the result or the _score not too
precise (e.g. cut after 3 decimal points) and then use caching ...

Its not a question of caching, since its not precomputed, and accessing doc values is done from memory, its just the time it takes to execute it with different script langs. A native Java one (once the option is around) should be the fastest. I will also experiment with Groovy++ which should provide close results to a built in Java option.

Regards,
Peter.

On 18 Jan., 00:31, Shay Banon shay.ba...@elasticsearch.com wrote:

I still wonder if its worth it to have an optimized script engine for numeric based calcs, I will hack with it a bit and see if it make sense. "You can never shave enough milliseconds" starts to dominate my life too much... :slight_smile:

On Tuesday, January 18, 2011 at 1:29 AM, Shay Banon wrote:

Ha :). Well, those long nights doing deep integration with Rhino were worth it :). I think Groovy might actually be faster (I fixed the problem you mention in master). All three languages, javascript (Rhino), python (jython), and groovy have very low level integration that make them really fast to execute (sadly, haven't found the same optimizations possible with jruby).

Though, its strange that mvel is taking this long compared to rhino, sure, it might not get compiled "as much" into bytecode, but still. I will look into it.
On Tuesday, January 18, 2011 at 12:44 AM, Karussell wrote:

Using javascript the query now executes in under 0.8sec!!
(when was the last time we used js to improve performance :wink: ? ok, its
only an implementation ... but: nice!)

BTW1: using groovy I got:

Parse Failure [Failed to parse source [na]]]; nested:
ElasticSearchIllegalArgumentException[script_lang not supported
[groovy]];

I added the groovy plugin jar like I added the js plugin jar (via
maven)

BTW2: I had to restart the node to change the language engine. Would
you mind to add this into the docs? Otherwise one thinks that one is
using the new language for the script but is still using the first one!


(Karussell) #18

Its not a question of caching, since its not precomputed, and accessing doc values is done from memory, its just the time it takes to execute it with different script langs. A native Java one (once the option is around) should be the fastest. I will also experiment with Groovy++ which should provide close results to a built in Java option.

But won't it always scale O(n) where n := noOfDoc ?

What should I do if I have 100 mio docs?


(Karussell) #19

On 19 Jan., 17:40, Karussell tableyourt...@googlemail.com wrote:

Its not a question of caching, since its not precomputed, and accessing doc values is done from memory, its just the time it takes to execute it with different script langs. A native Java one (once the option is around) should be the fastest. I will also experiment with Groovy++ which should provide close results to a built in Java option.

But won't it always scale O(n) where n := noOfDoc ?

What should I do if I have 100 mio docs?

I mean: will sharding really solve this problem then?


(Shay Banon) #20

I don't really understand the question. The script will get executed for each hit, there is no way around it. Sharding will help since it means the computation will get separated to several different nodes.
On Wednesday, January 19, 2011 at 7:12 PM, Karussell wrote:

On 19 Jan., 17:40, Karussell tableyourt...@googlemail.com wrote:

Its not a question of caching, since its not precomputed, and accessing doc values is done from memory, its just the time it takes to execute it with different script langs. A native Java one (once the option is around) should be the fastest. I will also experiment with Groovy++ which should provide close results to a built in Java option.

But won't it always scale O(n) where n := noOfDoc ?

What should I do if I have 100 mio docs?

I mean: will sharding really solve this problem then?