Script filter slow query execution

vinamar · January 28, 2014, 7:27pm

I'm trying to use script filter with filtered query as given below. The query execution takes around 20 to 30 secs for 70K matching facet results. How to speed up the query execution ?

{
"timeout": 30000,
"highlight": {
"pre_tags": [
""
],
"post_tags": [
""
],
"fields": {
"srId": {
"number_of_fragments": 0
},
"emails.emailBody": {
"number_of_fragments": 0
},
"chatTextArray.text": {
"number_of_fragments": 0
}
}
},
"query": {
"custom_filters_score": {
"query": {
"filtered": {
"filter": {
"and": [
{
"term": {
"domains.L1.domain.id": "1"
}
},
{
"range": {
"closeDt": {
"from": 1325404800000,
"to": 1390982400000
}
}
}
]
},
"query": {
"query_string": {
"query": "payment button",
"default_operator": "AND",
"fields": [
[
"emails.emailBody",
"srId",
"chatTextArray.text"
]
]
}
}
}
},
"filters": [
{
"filter": {
"exists": {
"field": "domains.L1.domain.id"
}
},
"script": "int docscore=doc['domains.L1.domain.1.prob'].value1.5;int expscore=pow(docscore,7);_score=_score_source._boost*expscore"
}
]
}
},
"size": 10
}

Binh_Ly · January 28, 2014, 11:05pm

Vinoth,

I'd try to eliminate that "_source._boost" part from your script and see if
that makes any difference. If it does, store your doc boost in a field and
access it like:

doc["myboostvalue"].value

and see if that helps.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5fe43c96-d3e8-4cfa-8004-a7e7c0b665f8%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

vinamar · January 29, 2014, 6:58pm

I tried removing _source.boost the query performance is significantly improved 30x

Is there a reason why _source.boost would hit on the performance ?

During index time we compute the document boost and store it in the boost field itself.

Do we need to also store it in a separate field ?

Thanks, Vinoth.

Binh_Ly · January 29, 2014, 7:22pm

Vinoth,

When you access _source.X in a script, the _source field is loaded per doc,
parsed, and then provided to the script. Depending on how many documents
you are hitting, it can be slow if there are a lot.

The document _boost should already be factored into the score, but if you
need to extract a numeric boost value and make it part of your script, try
using doc["boostfield"].value, instead of _source.boostfield. That way, the
values are loaded into memory and would perform better. Just be aware that
it will take up some memory.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2788e9f4-4532-46dc-971b-77ec54585faa%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

vinamar · January 30, 2014, 8:22pm

Hi Binh,

We are setting _boost at index time per document. If that is included as a part of _score then we are fine.

Can you confirm if the document _boost is included in the _score. I don't see it in the query explain.

for doc _boost: 1.41
"value" : 0.14836232,
"description" : "score(doc=140,freq=22.0 = termFreq=22.0\n), product of:",
"details" : [ {
"value" : 0.15365347,
"description" : "queryWeight, product of:",
"details" : [ {
"value" : 3.2937443,
"description" : "idf(docFreq=142048, maxDocs=1407987)"
}, {
"value" : 0.046650093,
"description" : "queryNorm"
} ]
}, {
"value" : 0.9655644,
"description" : "fieldWeight in 140, product of:",
"details" : [ {
"value" : 4.690416,
"description" : "tf(freq=22.0), with freq of:",
"details" : [ {
"value" : 22.0,
"description" : "termFreq=22.0"
} ]
}, {
"value" : 3.2937443,
"description" : "idf(docFreq=142048, maxDocs=1407987)"
}, {
"value" : 0.0625,
"description" : "fieldNorm(doc=140)"

Binh_Ly · January 30, 2014, 8:41pm

Vinoth, I just got word that doc _boost will be deprecated in ES 1.0. It is
recommended that you start using the function_score query instead and just
store your doc "boost" as a field and extract it using doc["boost"].value
method moving forward:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/mapping-boost-field.html

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dd72fbbd-4a0a-4084-9b56-0f6a91df6584%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

vinamar · January 30, 2014, 10:05pm

Hi Binh,

Thanks we will take a look at using function_score query at later point of time.

To switch to use function_score_query to access this new boost field will require us to re-process and re-index all documents.

We will migrate to using function_score query when we plan to re-index documents at later point of time (at that time we will have this new numeric field representing the boost)

Does _score include the product of _boost factor as of now ? is there any other way to include the _boost value without any overhead in the script function ?

fyi. we are on ES 0.90.7

Thanks, Vinoth.

Binh_Ly · January 30, 2014, 10:25pm

Vinoth, If you did the _boost according to this link
(http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-boost-field.html),
then yes it is precomputed into the _score already.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0be266f0-3ada-43e7-836c-d98d54a966b3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ivan · January 31, 2014, 4:01pm

Judging by the commits, the boost functionality was not removed, only
deprecated. That said, you really should move to query time boosting.

Index-time boosts are encoded inside the field norms. You should see a
difference inside each field norm. If you have omitted norms, then you will
not have any boosts on that field. The field norm is also lossly since it
uses only 1-byte. Some of the many reasons to switch from document time
boosts.

Cheers,

Ivan

On Thu, Jan 30, 2014 at 2:25 PM, Binh Ly binh@hibalo.com wrote:

Vinoth, If you did the _boost according to this link (
Elasticsearch Platform — Find real-time answers at scale | Elastic),
then yes it is precomputed into the _score already.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0be266f0-3ada-43e7-836c-d98d54a966b3%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCLDBP4qmg0y8Ay%3De%2BPwWjKDduT1Nm%3Dn%2B7mttefA142uQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Script filter very slow Elasticsearch	3	912	December 25, 2018
Help on script score query with filter script Elasticsearch	8	1314	February 26, 2022
Same Script filter is Extremly fast and Extremly slow with a small change. Why Elasticsearch	1	334	July 6, 2017
ScriptFilter very slow, need to do: startField <= number <= endField Elasticsearch	10	504	July 6, 2017
Script_fields vs filter script Elasticsearch	5	1003	July 6, 2017

Script filter slow query execution

Related topics