I a bit curious about the effect of custom_score on performance and what I
might do to help the situation. I present three different and very simple
cases.
A basic match_all with a custom_score wrapper with no additional
scripting functions. This returns in 20ms. Not so bad.
Same thing without the custom_score is considerably faster. Makes the
20ms above seem horrible.
{
"query": {
"match_all": {}
}
}
Finally, when I add a size and, much worse, a fields restriction, the
time jumps to 70ms. Without the custom score, the queries are consistently
around 2ms, regardless of size or fields.
Hi Justin,
the script gets executed for each document, thus the more documents you get
back, the more time will be spent there.
The fields directive could cause worse performance too if the id field is
not stored, since the _source would get parsed and only the id field would
get returned out of it, while by default the whole _source would be
returned, that's it.
Have you tried making those changes separately? Only increase the size and
measure the time, and only add the fields directive and measure the time?
Also, that kind of test would make more sense with a real script and your
real usage of elasticsearch. Are you planning to only retrieve the id field
back from elasticsearch?
On Wednesday, July 31, 2013 5:56:08 PM UTC+2, Justin Treher wrote:
I a bit curious about the effect of custom_score on performance and what I
might do to help the situation. I present three different and very simple
cases.
A basic match_all with a custom_score wrapper with no additional
scripting functions. This returns in 20ms. Not so bad.
Same thing without the custom_score is considerably faster. Makes the
20ms above seem horrible.
{
"query": {
"match_all": {}
}
}
Finally, when I add a size and, much worse, a fields restriction, the
time jumps to 70ms. Without the custom score, the queries are consistently
around 2ms, regardless of size or fields.
Hi Justin,
part of my reply was incorrect, sorry about that Thinking more about it,
it's not true that the size influences the time spent on executing the
script. In a custom score query the script is executed to compute the score
of all documents, not only the ones that get returned (which you can
control through the size parameter. Thus if the number of documents that
match the query is the same, the total time spent on running the script
should be pretty much the same, since it's run on the same number of
documents.
I'd also add that it is pretty normal that a match_all query is fast. I
would run tests with real queries that you are going to use.
On Monday, August 5, 2013 3:10:23 PM UTC+2, Luca Cavanna wrote:
Hi Justin,
the script gets executed for each document, thus the more documents you
get back, the more time will be spent there.
The fields directive could cause worse performance too if the id field is
not stored, since the _source would get parsed and only the id field would
get returned out of it, while by default the whole _source would be
returned, that's it.
Have you tried making those changes separately? Only increase the size and
measure the time, and only add the fields directive and measure the time?
Also, that kind of test would make more sense with a real script and your
real usage of elasticsearch. Are you planning to only retrieve the id field
back from elasticsearch?
On Wednesday, July 31, 2013 5:56:08 PM UTC+2, Justin Treher wrote:
I a bit curious about the effect of custom_score on performance and what
I might do to help the situation. I present three different and very simple
cases.
A basic match_all with a custom_score wrapper with no additional
scripting functions. This returns in 20ms. Not so bad.
Same thing without the custom_score is considerably faster. Makes the
20ms above seem horrible.
{
"query": {
"match_all": {}
}
}
Finally, when I add a size and, much worse, a fields restriction, the
time jumps to 70ms. Without the custom score, the queries are consistently
around 2ms, regardless of size or fields.
Thank you for the feedback on how custom_score works relative to doc size.
In all my testing, the custom_score is adding substantial time whether it
be a match_all() or the full blown complex query I send through. It's
completely relative. Again, I'm not even adding anything to the custom
score except for the base _score parameter. I just don't understand the
magnitude of time that custom score is adding (50ms+/-).
The fields param doesn't seem to be impacting performance at all.
JT
On Monday, August 5, 2013 9:23:15 AM UTC-4, Luca Cavanna wrote:
Hi Justin,
part of my reply was incorrect, sorry about that Thinking more about
it, it's not true that the size influences the time spent on executing the
script. In a custom score query the script is executed to compute the score
of all documents, not only the ones that get returned (which you can
control through the size parameter. Thus if the number of documents that
match the query is the same, the total time spent on running the script
should be pretty much the same, since it's run on the same number of
documents.
I'd also add that it is pretty normal that a match_all query is fast. I
would run tests with real queries that you are going to use.
On Monday, August 5, 2013 3:10:23 PM UTC+2, Luca Cavanna wrote:
Hi Justin,
the script gets executed for each document, thus the more documents you
get back, the more time will be spent there.
The fields directive could cause worse performance too if the id field is
not stored, since the _source would get parsed and only the id field would
get returned out of it, while by default the whole _source would be
returned, that's it.
Have you tried making those changes separately? Only increase the size
and measure the time, and only add the fields directive and measure the
time?
Also, that kind of test would make more sense with a real script and your
real usage of elasticsearch. Are you planning to only retrieve the id field
back from elasticsearch?
On Wednesday, July 31, 2013 5:56:08 PM UTC+2, Justin Treher wrote:
I a bit curious about the effect of custom_score on performance and what
I might do to help the situation. I present three different and very simple
cases.
A basic match_all with a custom_score wrapper with no additional
scripting functions. This returns in 20ms. Not so bad.
Same thing without the custom_score is considerably faster. Makes the
20ms above seem horrible.
{
"query": {
"match_all": {}
}
}
Finally, when I add a size and, much worse, a fields restriction, the
time jumps to 70ms. Without the custom score, the queries are consistently
around 2ms, regardless of size or fields.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.