Performance issue with script scoring with fields having a large array


(nicolas) #1

I have document having fields containing large array.

I would like to score according to the value of a nth element of such
array, but got very slow answer (5s) for only 10K document indexed.

my mapping:
document {
id: value,
field2: string,
field3: [ int_1,int_2, ... , int_10k] <- large array of 10K integers
}

assume I generated and indexed 10K documents with 1K random integer values
in the field 'field3'

I then use the following search query

GET /test/document/_search
{
"query":{
"function_score":{
"script_score" : {
"script" : " _source.fields3[12] * _source.fields3[11] "
}

=> got 5000 ms

however with basic Java object with a simple nested loop:

  • for all the documents
    score[i] = doc[i].fields[12] * doc[i].fields[11]
  • sort by score

=> got < 50 ms

ES is 100 slower than a simple loop..

How to get similar performance with ES?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/db53da70-4f75-4088-b9a6-2cde3caef062%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Radu Gheorghe) #2

Hello,

Using _source for scripts is typically slow, because ES has to go to each
stored document and extract fields from there. A faster approach is to use
something like doc['field3'].values[12], which will used the field data
cache (already loaded in memory, at least after the first run):
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-scripting.html#_document_fields

More details about field data can be found here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/index-modules-fielddata.htm

Best regards,
Radu

Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

On Wed, Apr 30, 2014 at 12:27 PM, NM n.maisonneuve@gmail.com wrote:

I have document having fields containing large array.

I would like to score according to the value of a nth element of such
array, but got very slow answer (5s) for only 10K document indexed.

my mapping:
document {
id: value,
field2: string,
field3: [ int_1,int_2, ... , int_10k] <- large array of 10K integers
}

assume I generated and indexed 10K documents with 1K random integer values
in the field 'field3'

I then use the following search query

GET /test/document/_search
{
"query":{
"function_score":{
"script_score" : {
"script" : " _source.fields3[12] * _source.fields3[11] "
}

=> got 5000 ms

however with basic Java object with a simple nested loop:

  • for all the documents
    score[i] = doc[i].fields[12] * doc[i].fields[11]
  • sort by score

=> got < 50 ms

ES is 100 slower than a simple loop..

How to get similar performance with ES?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/db53da70-4f75-4088-b9a6-2cde3caef062%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/db53da70-4f75-4088-b9a6-2cde3caef062%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHXA0_2wmDJFBJvJ1fTUsszaP7GjVtJYfSU-AbHMq6NS%2BVqhFw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3