Hi,
Efficiency question for you guys.
I have a set of documents with a repeated integer:
{
"document": {
"properties": {
"id": {
"index": "no",
"type": "string"
},
"fprint": {
"postings_format": "bloom_pulsing",
"type": "long"
},
"fprint_size": {
"include_in_all": false,
"store": true,
"type": "integer"
},
}
}
}
My query is a set of fingerprints, and I would like the final score to be #
of matching fingerprints from the document normalized by the number of
fingerprints in the document. This query retrieves the right set, but does
not normalize:
{ "query": {
"bool": {
"should" : [
{
"term" : { "morgan_fprint" : 632180975 }
},
{
"term" : { "morgan_fprint" : 1039876598 }
},
{
"term" : { "morgan_fprint" : 2246728737 }
},
{
"term" : { "morgan_fprint" : 2264700157 }
}
],
"minimum_should_match" : 1,
"boost" : 1.0
}
}
}'
The advantage of this query is that it takes only ~50ms.
This query does the normalization, but is significantly slower (e.g. 250ms):
{ "query": {
"custom_score": {
"script" : "(_score / doc.morgan_fprint_size.value)",
"query": {
"bool": {
"should" : [
{
"term" : { "morgan_fprint" : 632180975 }
},
{
"term" : { "morgan_fprint" : 1039876598 }
},
{
"term" : { "morgan_fprint" : 2246728737 }
},
{
"term" : { "morgan_fprint" : 2264700157 }
}
],
"minimum_should_match" : 1,
"boost" : 1.0
}
}
}
} }
====
I assume this is because of how morgan_fprint_size.value is retrieved from
disk during query execution. Is there a good way of structuring the index
such that I can get both a fast, normalized query?
I attempted to do an index side, per-document boost (e.g. something like
"fprint": { "_value": 12345, "_value": 67890, "_boost": 0.5 }). However I
got this error:
"You cannot set an index-time boost on an unindexed field, or one that
omits norms"
... So that didn't work.
Thanks,
Chris
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.