Profiling shows large cpu usage for scripts

Paul_Loy · May 12, 2011, 10:51pm

Profiling our application we see that around 40% of our CPU used is in
executing the script part of our customscorequery.

The only script we have is "random()" in order to get a random list of
results that match our query.

Wondering if there is any way to optimize this?

Thanks,

Paul.

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

kimchy · May 12, 2011, 11:50pm

Which version of elasticsearch are you using? Since 0.16, you can implement custom Java based scripts that will be faster: Elasticsearch Platform — Find real-time answers at scale | Elastic. Might require a sample code integration, I can help.
On Friday, May 13, 2011 at 1:51 AM, Paul Loy wrote:

Profiling our application we see that around 40% of our CPU used is in executing the script part of our customscorequery.

The only script we have is "random()" in order to get a random list of results that match our query.

Wondering if there is any way to optimize this?

Thanks,

Paul.

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

Attachments:

profile.png

Paul_Loy · May 13, 2011, 12:28am

0.16.1

Ah, I remember seeing that on a changelist / release notes.

Thanks,

On Fri, May 13, 2011 at 12:50 AM, Shay Banon
shay.banon@elasticsearch.comwrote:

Which version of elasticsearch are you using? Since 0.16, you can
implement custom Java based scripts that will be faster:
Elasticsearch Platform — Find real-time answers at scale | Elastic. Might
require a sample code integration, I can help.

On Friday, May 13, 2011 at 1:51 AM, Paul Loy wrote:

Profiling our application we see that around 40% of our CPU used is in
executing the script part of our customscorequery.

The only script we have is "random()" in order to get a random list of
results that match our query.

Wondering if there is any way to optimize this?

Thanks,

Paul.

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

Attachments:

profile.png

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

Paul_Loy · May 13, 2011, 1:28am

So I put:

scripts.natives.sp_rand.type: com.example.RandomOrderingNativeScriptFactory

into my index settings and that did'nt work. Then I put it into the main
settings and it also didn't work.

I get a stack trace:

org.elasticsearch.action.
search.SearchPhaseExecutionException: Failed to execute phase [query_fetch],
total failure; shardFailures {[PW79QHCNS1ymKqdcLaICNQ][ugc][0]:
SearchParseException[[ugc][0]: from[0],size[1000]: Parse Failure [Failed to
parse source [
򃦲om size$
„query򇦩ltered򂺋custom_score򂺈match_all򻻅scriptFsp_randƒlangEnative󻅦ilter򂡮d򆦩lters𺃴erm򅮥west@1󻹻󻻻]]];
nested: ElasticSearchIllegalArgumentException[Native script [sp_rand] not
found]; }
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:248)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.access$400(TransportSearchTypeAction.java:75)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$3.onFailure(TransportSearchTypeAction.java:198)
at
org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:227)
at
org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:71)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:192)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.access$000(TransportSearchTypeAction.java:75)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:169)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

On Fri, May 13, 2011 at 1:28 AM, Paul Loy keteracel@gmail.com wrote:

0.16.1

Ah, I remember seeing that on a changelist / release notes.

Thanks,

On Fri, May 13, 2011 at 12:50 AM, Shay Banon <shay.banon@elasticsearch.com

wrote:

Which version of elasticsearch are you using? Since 0.16, you can
implement custom Java based scripts that will be faster:
Elasticsearch Platform — Find real-time answers at scale | Elastic.
Might require a sample code integration, I can help.

On Friday, May 13, 2011 at 1:51 AM, Paul Loy wrote:

Profiling our application we see that around 40% of our CPU used is in
executing the script part of our customscorequery.

The only script we have is "random()" in order to get a random list of
results that match our query.

Wondering if there is any way to optimize this?

Thanks,

Paul.

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

Attachments:

profile.png

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

Paul_Loy · May 13, 2011, 1:38am

Looking through the code it's because it should be:

script.native.sp_rand.type: com.example.RandomOrderingNativeScriptFactory

The documentation is slightly wrong (has extraneous 's' chars and says
'type' rather than 'lang').

On Fri, May 13, 2011 at 2:28 AM, Paul Loy keteracel@gmail.com wrote:

So I put:

scripts.natives.sp_rand.type: com.example.RandomOrderingNativeScriptFactory

into my index settings and that did'nt work. Then I put it into the main
settings and it also didn't work.

I get a stack trace:

org.elasticsearch.action.
search.SearchPhaseExecutionException: Failed to execute phase
[query_fetch], total failure; shardFailures
{[PW79QHCNS1ymKqdcLaICNQ][ugc][0]: SearchParseException[[ugc][0]:
from[0],size[1000]: Parse Failure [Failed to parse source [
򃦲om size$
„query򇦩ltered򂺋custom_score򂺈match_all򻻅scriptFsp_randƒlangEnative󻅦ilter򂡮d򆦩lters𺃴erm򅮥west@1󻹻󻻻]]];
nested: ElasticSearchIllegalArgumentException[Native script [sp_rand] not
found]; }
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:248)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.access$400(TransportSearchTypeAction.java:75)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$3.onFailure(TransportSearchTypeAction.java:198)
at
org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:227)
at
org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:71)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:192)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.access$000(TransportSearchTypeAction.java:75)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:169)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

On Fri, May 13, 2011 at 1:28 AM, Paul Loy keteracel@gmail.com wrote:

0.16.1

Ah, I remember seeing that on a changelist / release notes.

Thanks,

On Fri, May 13, 2011 at 12:50 AM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

Which version of elasticsearch are you using? Since 0.16, you can
implement custom Java based scripts that will be faster:
Elasticsearch Platform — Find real-time answers at scale | Elastic.
Might require a sample code integration, I can help.

On Friday, May 13, 2011 at 1:51 AM, Paul Loy wrote:

Profiling our application we see that around 40% of our CPU used is in
executing the script part of our customscorequery.

The only script we have is "random()" in order to get a random list of
results that match our query.

Wondering if there is any way to optimize this?

Thanks,

Paul.

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

Attachments:

profile.png

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

kimchy · May 13, 2011, 11:00am

Fixed the docs, thanks!. Also, the AbstractFloatSearchScript will be simpler to use. If you want to share the script you use, then it might be further optimized (like using ThreadLocalRandom). Hows the perf now?
On Friday, May 13, 2011 at 4:38 AM, Paul Loy wrote:

Looking through the code it's because it should be:

script.native.sp_rand.type: com.example.RandomOrderingNativeScriptFactory

The documentation is slightly wrong (has extraneous 's' chars and says 'type' rather than 'lang').

On Fri, May 13, 2011 at 2:28 AM, Paul Loy keteracel@gmail.com wrote:

So I put:

scripts.natives.sp_rand.type: com.example.RandomOrderingNativeScriptFactory

into my index settings and that did'nt work. Then I put it into the main settings and it also didn't work.

I get a stack trace:

org.elasticsearch.action.
search.SearchPhaseExecutionException: Failed to execute phase [query_fetch], total failure; shardFailures {[PW79QHCNS1ymKqdcLaICNQ][ugc][0]: SearchParseException[[ugc][0]: from[0],size[1000]: Parse Failure [Failed to parse source [
ò¦²om size$ âqueryò¦©lteredòºcustom_scoreòºmatch_allò»»scriptFsp_randÆlangEnativeó»¦ilterò¡®dò¦©ltersðº´ermò®¥west@1ó»¹»ó»»»]]]; nested: ElasticSearchIllegalArgumentException[Native script [sp_rand] not found]; }
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:248)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.access$400(TransportSearchTypeAction.java:75)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$3.onFailure(TransportSearchTypeAction.java:198)
at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:227)
at org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:71)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:192)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.access$000(TransportSearchTypeAction.java:75)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:169)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

On Fri, May 13, 2011 at 1:28 AM, Paul Loy keteracel@gmail.com wrote:

0.16.1

Ah, I remember seeing that on a changelist / release notes.

Thanks,

On Fri, May 13, 2011 at 12:50 AM, Shay Banon shay.banon@elasticsearch.com wrote:

Which version of elasticsearch are you using? Since 0.16, you can implement custom Java based scripts that will be faster: Elasticsearch Platform — Find real-time answers at scale | Elastic. Might require a sample code integration, I can help.
On Friday, May 13, 2011 at 1:51 AM, Paul Loy wrote:

Profiling our application we see that around 40% of our CPU used is in executing the script part of our customscorequery.

The only script we have is "random()" in order to get a random list of results that match our query.

Wondering if there is any way to optimize this?

Thanks,

Paul.

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

Attachments:

profile.png

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

Paul_Loy · May 13, 2011, 6:32pm

Thanks Shay.

After reprofiling I don't see the script taking up any cpu cycles any more.
I have to verify that it's functionally correct, but looks like a massive
perf boost.

Now the next biggest cpu usage is in deserializing the source into a
mapon search results - which is now around 30%.

We use source rather than fields as we've pushed into the index pretty much
everything we need to return to the client app. I think you've said
previously that it's more optimal to use the source rather than fields when
you require a large subset of fields to be returned. We have around 40
fields.

---- time passing by ----

We do throw away a percentage of search results based on what a user wants
to ignore so I have now optimized the deserialization process to only
deserialize when I want one (or more) of the fields out of the map. This has
massively reduced the deserialization overhead.

I also just jump straight into a JsonXContent rather than using
XContentFactory.xContent(source).createParser(source). I'm wondering if
InternalSearchHit#sourceAsMap could do the same? Seems we already know it's
going to be Json so why waste cpu cycles sniffing the content?

So, all in all, the native scripts are very highly recommended!

Thanks.

On Fri, May 13, 2011 at 4:00 AM, Shay Banon shay.banon@elasticsearch.comwrote:

Fixed the docs, thanks!. Also, the AbstractFloatSearchScript will be
simpler to use. If you want to share the script you use, then it might be
further optimized (like using ThreadLocalRandom). Hows the perf now?

On Friday, May 13, 2011 at 4:38 AM, Paul Loy wrote:

Looking through the code it's because it should be:

script.native.sp_rand.type: com.example.RandomOrderingNativeScriptFactory

The documentation is slightly wrong (has extraneous 's' chars and says
'type' rather than 'lang').

On Fri, May 13, 2011 at 2:28 AM, Paul Loy keteracel@gmail.com wrote:

So I put:

scripts.natives.sp_rand.type: com.example.RandomOrderingNativeScriptFactory

into my index settings and that did'nt work. Then I put it into the main
settings and it also didn't work.

I get a stack trace:

org.elasticsearch.action.
search.SearchPhaseExecutionException: Failed to execute phase
[query_fetch], total failure; shardFailures
{[PW79QHCNS1ymKqdcLaICNQ][ugc][0]: SearchParseException[[ugc][0]:
from[0],size[1000]: Parse Failure [Failed to parse source [
򃦲om size$
„query򇦩ltered򂺋custom_score򂺈match_all򻻅scriptFsp_randƒlangEnative󻅦ilter򂡮d򆦩lters𺃴erm򅮥west@1󻹻󻻻]]];
nested: ElasticSearchIllegalArgumentException[Native script [sp_rand] not
found]; }
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:248)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.access$400(TransportSearchTypeAction.java:75)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$3.onFailure(TransportSearchTypeAction.java:198)
at
org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:227)
at
org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:71)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:192)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.access$000(TransportSearchTypeAction.java:75)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:169)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

On Fri, May 13, 2011 at 1:28 AM, Paul Loy keteracel@gmail.com wrote:

0.16.1

Ah, I remember seeing that on a changelist / release notes.

Thanks,

On Fri, May 13, 2011 at 12:50 AM, Shay Banon <shay.banon@elasticsearch.com

wrote:

Which version of elasticsearch are you using? Since 0.16, you can
implement custom Java based scripts that will be faster:
Elasticsearch Platform — Find real-time answers at scale | Elastic. Might
require a sample code integration, I can help.

On Friday, May 13, 2011 at 1:51 AM, Paul Loy wrote:

Profiling our application we see that around 40% of our CPU used is in
executing the script part of our customscorequery.

The only script we have is "random()" in order to get a random list of
results that match our query.

Wondering if there is any way to optimize this?

Thanks,

Paul.

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

Attachments:

profile.png

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

kimchy · May 13, 2011, 6:53pm

Sniffing the type is very fast, there is no way that its taking much time...
On Friday, May 13, 2011 at 9:32 PM, Paul Loy wrote:

Thanks Shay.

After reprofiling I don't see the script taking up any cpu cycles any more. I have to verify that it's functionally correct, but looks like a massive perf boost.

Now the next biggest cpu usage is in deserializing the source into a map on search results - which is now around 30%.

We use source rather than fields as we've pushed into the index pretty much everything we need to return to the client app. I think you've said previously that it's more optimal to use the source rather than fields when you require a large subset of fields to be returned. We have around 40 fields.

---- time passing by ----

We do throw away a percentage of search results based on what a user wants to ignore so I have now optimized the deserialization process to only deserialize when I want one (or more) of the fields out of the map. This has massively reduced the deserialization overhead.

I also just jump straight into a JsonXContent rather than using XContentFactory.xContent(source).createParser(source). I'm wondering if InternalSearchHit#sourceAsMap could do the same? Seems we already know it's going to be Json so why waste cpu cycles sniffing the content?

So, all in all, the native scripts are very highly recommended!

Thanks.

On Fri, May 13, 2011 at 4:00 AM, Shay Banon shay.banon@elasticsearch.com wrote:

Fixed the docs, thanks!. Also, the AbstractFloatSearchScript will be simpler to use. If you want to share the script you use, then it might be further optimized (like using ThreadLocalRandom). Hows the perf now?
On Friday, May 13, 2011 at 4:38 AM, Paul Loy wrote:

Looking through the code it's because it should be:

script.native.sp_rand.type: com.example.RandomOrderingNativeScriptFactory

The documentation is slightly wrong (has extraneous 's' chars and says 'type' rather than 'lang').

On Fri, May 13, 2011 at 2:28 AM, Paul Loy keteracel@gmail.com wrote:

So I put:

scripts.natives.sp_rand.type: com.example.RandomOrderingNativeScriptFactory

into my index settings and that did'nt work. Then I put it into the main settings and it also didn't work.

I get a stack trace:

org.elasticsearch.action.
search.SearchPhaseExecutionException: Failed to execute phase [query_fetch], total failure; shardFailures {[PW79QHCNS1ymKqdcLaICNQ][ugc][0]: SearchParseException[[ugc][0]: from[0],size[1000]: Parse Failure [Failed to parse source [
ò¦²om size$ âqueryò¦©lteredòºcustom_scoreòºmatch_allò»»scriptFsp_randÆlangEnativeó»¦ilterò¡®dò¦©ltersðº´ermò®¥west@1ó»¹»ó»»»]]]; nested: ElasticSearchIllegalArgumentException[Native script [sp_rand] not found]; }
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:248)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.access$400(TransportSearchTypeAction.java:75)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$3.onFailure(TransportSearchTypeAction.java:198)
at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:227)
at org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:71)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:192)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.access$000(TransportSearchTypeAction.java:75)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:169)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

On Fri, May 13, 2011 at 1:28 AM, Paul Loy keteracel@gmail.com wrote:

0.16.1

Ah, I remember seeing that on a changelist / release notes.

Thanks,

On Fri, May 13, 2011 at 12:50 AM, Shay Banon shay.banon@elasticsearch.com wrote:

Which version of elasticsearch are you using? Since 0.16, you can implement custom Java based scripts that will be faster: Elasticsearch Platform — Find real-time answers at scale | Elastic. Might require a sample code integration, I can help.
On Friday, May 13, 2011 at 1:51 AM, Paul Loy wrote:

Profiling our application we see that around 40% of our CPU used is in executing the script part of our customscorequery.

The only script we have is "random()" in order to get a random list of results that match our query.

Wondering if there is any way to optimize this?

Thanks,

Paul.

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

Attachments:

profile.png

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

Paul_Loy · May 13, 2011, 8:27pm

yeah, not saying it's taking much, but if we know it's Json why do it?

On Fri, May 13, 2011 at 11:53 AM, Shay Banon
shay.banon@elasticsearch.comwrote:

Sniffing the type is very fast, there is no way that its taking much
time...

On Friday, May 13, 2011 at 9:32 PM, Paul Loy wrote:

Thanks Shay.

After reprofiling I don't see the script taking up any cpu cycles any more.
I have to verify that it's functionally correct, but looks like a massive
perf boost.

Now the next biggest cpu usage is in deserializing the source into a mapon search results - which is now around 30%.

We use source rather than fields as we've pushed into the index pretty much
everything we need to return to the client app. I think you've said
previously that it's more optimal to use the source rather than fields when
you require a large subset of fields to be returned. We have around 40
fields.

---- time passing by ----

We do throw away a percentage of search results based on what a user wants
to ignore so I have now optimized the deserialization process to only
deserialize when I want one (or more) of the fields out of the map. This has
massively reduced the deserialization overhead.

I also just jump straight into a JsonXContent rather than using
XContentFactory.xContent(source).createParser(source). I'm wondering if
InternalSearchHit#sourceAsMap could do the same? Seems we already know it's
going to be Json so why waste cpu cycles sniffing the content?

So, all in all, the native scripts are very highly recommended!

Thanks.

On Fri, May 13, 2011 at 4:00 AM, Shay Banon shay.banon@elasticsearch.comwrote:

Fixed the docs, thanks!. Also, the AbstractFloatSearchScript will be
simpler to use. If you want to share the script you use, then it might be
further optimized (like using ThreadLocalRandom). Hows the perf now?

On Friday, May 13, 2011 at 4:38 AM, Paul Loy wrote:

Looking through the code it's because it should be:

script.native.sp_rand.type: com.example.RandomOrderingNativeScriptFactory

The documentation is slightly wrong (has extraneous 's' chars and says
'type' rather than 'lang').

On Fri, May 13, 2011 at 2:28 AM, Paul Loy keteracel@gmail.com wrote:

So I put:

scripts.natives.sp_rand.type: com.example.RandomOrderingNativeScriptFactory

into my index settings and that did'nt work. Then I put it into the main
settings and it also didn't work.

I get a stack trace:

org.elasticsearch.action.
search.SearchPhaseExecutionException: Failed to execute phase
[query_fetch], total failure; shardFailures
{[PW79QHCNS1ymKqdcLaICNQ][ugc][0]: SearchParseException[[ugc][0]:
from[0],size[1000]: Parse Failure [Failed to parse source [
򃦲om size$
„query򇦩ltered򂺋custom_score򂺈match_all򻻅scriptFsp_randƒlangEnative󻅦ilter򂡮d򆦩lters𺃴erm򅮥west@1󻹻󻻻]]];
nested: ElasticSearchIllegalArgumentException[Native script [sp_rand] not
found]; }
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:248)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.access$400(TransportSearchTypeAction.java:75)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$3.onFailure(TransportSearchTypeAction.java:198)
at
org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:227)
at
org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:71)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:192)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.access$000(TransportSearchTypeAction.java:75)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:169)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

On Fri, May 13, 2011 at 1:28 AM, Paul Loy keteracel@gmail.com wrote:

0.16.1

Ah, I remember seeing that on a changelist / release notes.

Thanks,

On Fri, May 13, 2011 at 12:50 AM, Shay Banon <shay.banon@elasticsearch.com

wrote:

Which version of elasticsearch are you using? Since 0.16, you can
implement custom Java based scripts that will be faster:
Elasticsearch Platform — Find real-time answers at scale | Elastic. Might
require a sample code integration, I can help.

On Friday, May 13, 2011 at 1:51 AM, Paul Loy wrote:

Profiling our application we see that around 40% of our CPU used is in
executing the script part of our customscorequery.

The only script we have is "random()" in order to get a random list of
results that match our query.

Wondering if there is any way to optimize this?

Thanks,

Paul.

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

Attachments:

profile.png

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

kimchy · May 13, 2011, 9:24pm

You can write your own code that converts it to a Map, but I really think that you are overoptimizing now, it simply traverses the first 20 chars, you are not going to save anything.
On Friday, May 13, 2011 at 11:27 PM, Paul Loy wrote:

yeah, not saying it's taking much, but if we know it's Json why do it?

On Fri, May 13, 2011 at 11:53 AM, Shay Banon shay.banon@elasticsearch.com wrote:

Sniffing the type is very fast, there is no way that its taking much time...
On Friday, May 13, 2011 at 9:32 PM, Paul Loy wrote:

Thanks Shay.

After reprofiling I don't see the script taking up any cpu cycles any more. I have to verify that it's functionally correct, but looks like a massive perf boost.

Now the next biggest cpu usage is in deserializing the source into a map on search results - which is now around 30%.

We use source rather than fields as we've pushed into the index pretty much everything we need to return to the client app. I think you've said previously that it's more optimal to use the source rather than fields when you require a large subset of fields to be returned. We have around 40 fields.

---- time passing by ----

We do throw away a percentage of search results based on what a user wants to ignore so I have now optimized the deserialization process to only deserialize when I want one (or more) of the fields out of the map. This has massively reduced the deserialization overhead.

I also just jump straight into a JsonXContent rather than using XContentFactory.xContent(source).createParser(source). I'm wondering if InternalSearchHit#sourceAsMap could do the same? Seems we already know it's going to be Json so why waste cpu cycles sniffing the content?

So, all in all, the native scripts are very highly recommended!

Thanks.

On Fri, May 13, 2011 at 4:00 AM, Shay Banon shay.banon@elasticsearch.com wrote:

Fixed the docs, thanks!. Also, the AbstractFloatSearchScript will be simpler to use. If you want to share the script you use, then it might be further optimized (like using ThreadLocalRandom). Hows the perf now?
On Friday, May 13, 2011 at 4:38 AM, Paul Loy wrote:

Looking through the code it's because it should be:

script.native.sp_rand.type: com.example.RandomOrderingNativeScriptFactory

The documentation is slightly wrong (has extraneous 's' chars and says 'type' rather than 'lang').

On Fri, May 13, 2011 at 2:28 AM, Paul Loy keteracel@gmail.com wrote:

So I put:

scripts.natives.sp_rand.type: com.example.RandomOrderingNativeScriptFactory

into my index settings and that did'nt work. Then I put it into the main settings and it also didn't work.

I get a stack trace:

org.elasticsearch.action.
search.SearchPhaseExecutionException: Failed to execute phase [query_fetch], total failure; shardFailures {[PW79QHCNS1ymKqdcLaICNQ][ugc][0]: SearchParseException[[ugc][0]: from[0],size[1000]: Parse Failure [Failed to parse source [
ò¦²om size$ âqueryò¦©lteredòºcustom_scoreòºmatch_allò»»scriptFsp_randÆlangEnativeó»¦ilterò¡®dò¦©ltersðº´ermò®¥west@1ó»¹»ó»»»]]]; nested: ElasticSearchIllegalArgumentException[Native script [sp_rand] not found]; }
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:248)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.access$400(TransportSearchTypeAction.java:75)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$3.onFailure(TransportSearchTypeAction.java:198)
at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteFetch(SearchServiceTransportAction.java:227)
at org.elasticsearch.action.search.type.TransportSearchQueryAndFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryAndFetchAction.java:71)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:192)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.access$000(TransportSearchTypeAction.java:75)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$2.run(TransportSearchTypeAction.java:169)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

On Fri, May 13, 2011 at 1:28 AM, Paul Loy keteracel@gmail.com wrote:

0.16.1

Ah, I remember seeing that on a changelist / release notes.

Thanks,

On Fri, May 13, 2011 at 12:50 AM, Shay Banon shay.banon@elasticsearch.com wrote:

Which version of elasticsearch are you using? Since 0.16, you can implement custom Java based scripts that will be faster: Elasticsearch Platform — Find real-time answers at scale | Elastic. Might require a sample code integration, I can help.
On Friday, May 13, 2011 at 1:51 AM, Paul Loy wrote:

Profiling our application we see that around 40% of our CPU used is in executing the script part of our customscorequery.

The only script we have is "random()" in order to get a random list of results that match our query.

Wondering if there is any way to optimize this?

Thanks,

Paul.

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

Attachments:

profile.png

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy