Script filter slow query execution


(vinamar) #1

I'm trying to use script filter with filtered query as given below. The query execution takes around 20 to 30 secs for 70K matching facet results. How to speed up the query execution ?

{
"timeout": 30000,
"highlight": {
"pre_tags": [
""
],
"post_tags": [
"
"
],
"fields": {
"srId": {
"number_of_fragments": 0
},
"emails.emailBody": {
"number_of_fragments": 0
},
"chatTextArray.text": {
"number_of_fragments": 0
}
}
},
"query": {
"custom_filters_score": {
"query": {
"filtered": {
"filter": {
"and": [
{
"term": {
"domains.L1.domain.id": "1"
}
},
{
"range": {
"closeDt": {
"from": 1325404800000,
"to": 1390982400000
}
}
}
]
},
"query": {
"query_string": {
"query": "payment button",
"default_operator": "AND",
"fields": [
[
"emails.emailBody",
"srId",
"chatTextArray.text"
]
]
}
}
}
},
"filters": [
{
"filter": {
"exists": {
"field": "domains.L1.domain.id"
}
},
"script": "int docscore=doc['domains.L1.domain.1.prob'].value1.5;int expscore=pow(docscore,7);_score=_score_source._boost*expscore"
}
]
}
},
"size": 10
}


(Binh Ly) #2

Vinoth,

I'd try to eliminate that "_source._boost" part from your script and see if
that makes any difference. If it does, store your doc boost in a field and
access it like:

doc["myboostvalue"].value

and see if that helps.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5fe43c96-d3e8-4cfa-8004-a7e7c0b665f8%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(vinamar) #3

I tried removing _source.boost the query performance is significantly improved 30x

Is there a reason why _source.boost would hit on the performance ?

During index time we compute the document boost and store it in the boost field itself.

Do we need to also store it in a separate field ?

Thanks, Vinoth.


(Binh Ly) #4

Vinoth,

When you access _source.X in a script, the _source field is loaded per doc,
parsed, and then provided to the script. Depending on how many documents
you are hitting, it can be slow if there are a lot.

The document _boost should already be factored into the score, but if you
need to extract a numeric boost value and make it part of your script, try
using doc["boostfield"].value, instead of _source.boostfield. That way, the
values are loaded into memory and would perform better. Just be aware that
it will take up some memory.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2788e9f4-4532-46dc-971b-77ec54585faa%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(vinamar) #5

Hi Binh,

We are setting _boost at index time per document. If that is included as a part of _score then we are fine.

Can you confirm if the document _boost is included in the _score. I don't see it in the query explain.

for doc _boost: 1.41
"value" : 0.14836232,
"description" : "score(doc=140,freq=22.0 = termFreq=22.0\n), product of:",
"details" : [ {
"value" : 0.15365347,
"description" : "queryWeight, product of:",
"details" : [ {
"value" : 3.2937443,
"description" : "idf(docFreq=142048, maxDocs=1407987)"
}, {
"value" : 0.046650093,
"description" : "queryNorm"
} ]
}, {
"value" : 0.9655644,
"description" : "fieldWeight in 140, product of:",
"details" : [ {
"value" : 4.690416,
"description" : "tf(freq=22.0), with freq of:",
"details" : [ {
"value" : 22.0,
"description" : "termFreq=22.0"
} ]
}, {
"value" : 3.2937443,
"description" : "idf(docFreq=142048, maxDocs=1407987)"
}, {
"value" : 0.0625,
"description" : "fieldNorm(doc=140)"


(Binh Ly) #6

Vinoth, I just got word that doc _boost will be deprecated in ES 1.0. It is
recommended that you start using the function_score query instead and just
store your doc "boost" as a field and extract it using doc["boost"].value
method moving forward:

http://www.elasticsearch.org/guide/en/elasticsearch/reference/master/mapping-boost-field.html

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dd72fbbd-4a0a-4084-9b56-0f6a91df6584%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(vinamar) #7

Hi Binh,

Thanks we will take a look at using function_score query at later point of time.

To switch to use function_score_query to access this new boost field will require us to re-process and re-index all documents.

We will migrate to using function_score query when we plan to re-index documents at later point of time (at that time we will have this new numeric field representing the boost)

Does _score include the product of _boost factor as of now ? is there any other way to include the _boost value without any overhead in the script function ?

fyi. we are on ES 0.90.7

Thanks, Vinoth.


(Binh Ly) #8

Vinoth, If you did the _boost according to this link
(http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-boost-field.html),
then yes it is precomputed into the _score already.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0be266f0-3ada-43e7-836c-d98d54a966b3%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ivan Brusic) #9

Judging by the commits, the boost functionality was not removed, only
deprecated. That said, you really should move to query time boosting.

Index-time boosts are encoded inside the field norms. You should see a
difference inside each field norm. If you have omitted norms, then you will
not have any boosts on that field. The field norm is also lossly since it
uses only 1-byte. Some of the many reasons to switch from document time
boosts.

Cheers,

Ivan

On Thu, Jan 30, 2014 at 2:25 PM, Binh Ly binh@hibalo.com wrote:

Vinoth, If you did the _boost according to this link (
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-boost-field.html),
then yes it is precomputed into the _score already.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0be266f0-3ada-43e7-836c-d98d54a966b3%40googlegroups.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCLDBP4qmg0y8Ay%3De%2BPwWjKDduT1Nm%3Dn%2B7mttefA142uQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #10