Named queries slowing down searches

bhjfranzoi · March 29, 2016, 3:20pm

Hi all,

we're currently relying on elasticsearch named queries support for calculating a custom heuristic, for every result hit: given N "should" queries in a bool query, which query is every hit related to?

For example, querying looks like this:

"bool" : {
  "should" : [
  {
	"bool" : {
	  /* ... */,
	  "_name" : "query one"
	}
  },
  {
	"bool" : {
	  /* ... */,
	  "_name" : "query two"
	}
  },
  {
	"bool" : {
	  /* ... */,
	  "_name" : "query three"
	}
  }
  ]
}

For every hit result we extract the matched_queries array field: the bigger the array is, the more "trustable" the result is.

Except for one weird behaviour on corner cases, we'ore ok with this approach, from a functional point of view. But we're facing non-functional problems, in term of one order of magnituted increase in query time, compared with named queries disabled: 300 ms when enabled compared with 30 ms when disabled. Is this the expected behaviour?

Is this something related to the elasticsearch version? We're currenlty using version 1.7 (for simpler compatibility with client connectors). Should we expect better performances, in the context of named queries, simply ugrading to 2.x version?

Thanks for the support.

jpountz · March 31, 2016, 8:54am

I suspect that you are retrieving lots of documents per page?

Do you actually sort by the number of matched clauses? If yes you could just wrap sub queries under a constant_score query and this is what elasticsearch will do.[quote="bhjfranzoi, post:1, topic:45703"]
Should we expect better performances, in the context of named queries, simply ugrading to 2.x version?
[/quote]

No. Named queries work the same way in both versions.

bhjfranzoi · March 31, 2016, 12:30pm

Nope, I don't think we can consider this a "lots of documents" scenario: mentioned query times are relative to a 40 hits response, at most. Usually it's around 10/15 hits per response.

We're currently trying to keep both "trust" data: how many queries matched (by matched_queries attribute) and original "score" attribute. This way, we can later decide when a result is trustable or not, by combining a custom metric (matched queries / applied queries ratio) with ES score value. That's why we would prefer not use constant_score, in order not to loose the second metric (score).

Anyway, we managed to setup an ES 2.2 environment, and from basic manual benchmarks, the difference in query time is already present, even if ES 2.2 is taking shorter on the worst scenario (100 ms instead of 300 ms).

jpountz · April 4, 2016, 6:53am

40 should be fine indeed. What kind of queries are you putting under you bool queries (match, fuzzy, query_string?) Also would you be able to run such searches with named queries in a loop and capture hot threads while these searches are running?

Topic		Replies	Views
Is it possible to return only hits with certain named query in matched_queries array if that query hitted at least one time? Elasticsearch	1	343	May 10, 2019
Filter Elasticsearch results based on the number of 'matched_queries' Elasticsearch	2	265	November 11, 2021
Performance impact of "named-queries" Elasticsearch	1	172	April 17, 2024
Named Query Confusion Elasticsearch	2	416	June 7, 2021
Slow bool query Elasticsearch	3	588	May 4, 2018

Named queries slowing down searches

Related topics