we're currently relying on elasticsearch named queries support for calculating a custom heuristic, for every result hit: given N "should" queries in a bool query, which query is every hit related to?
For every hit result we extract the matched_queries array field: the bigger the array is, the more "trustable" the result is.
Except for one weird behaviour on corner cases, we'ore ok with this approach, from a functional point of view. But we're facing non-functional problems, in term of one order of magnituted increase in query time, compared with named queries disabled: 300 ms when enabled compared with 30 ms when disabled. Is this the expected behaviour?
Is this something related to the elasticsearch version? We're currenlty using version 1.7 (for simpler compatibility with client connectors). Should we expect better performances, in the context of named queries, simply ugrading to 2.x version?
I suspect that you are retrieving lots of documents per page?
Do you actually sort by the number of matched clauses? If yes you could just wrap sub queries under a constant_score query and this is what elasticsearch will do.[quote="bhjfranzoi, post:1, topic:45703"]
Should we expect better performances, in the context of named queries, simply ugrading to 2.x version?
[/quote]
No. Named queries work the same way in both versions.
Nope, I don't think we can consider this a "lots of documents" scenario: mentioned query times are relative to a 40 hits response, at most. Usually it's around 10/15 hits per response.
We're currently trying to keep both "trust" data: how many queries matched (by matched_queries attribute) and original "score" attribute. This way, we can later decide when a result is trustable or not, by combining a custom metric (matched queries / applied queries ratio) with ES score value. That's why we would prefer not use constant_score, in order not to loose the second metric (score).
Anyway, we managed to setup an ES 2.2 environment, and from basic manual benchmarks, the difference in query time is already present, even if ES 2.2 is taking shorter on the worst scenario (100 ms instead of 300 ms).
40 should be fine indeed. What kind of queries are you putting under you bool queries (match, fuzzy, query_string?) Also would you be able to run such searches with named queries in a loop and capture hot threads while these searches are running?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.