Query String Query Performance Issue when using ~ operator - ES 1.6


(Effi) #1

Hello,

When running a query like:

POST /my-index/_search
{
  "from" : 0,
  "size" : 5,
  "query" : {
      "query_string" : {
      "fields" : ["field1","field2",...,"fieldn"],
      "query" : "flight~",
      "use_dis_max" : true
    }
  },
  "highlight" : {
    "pre_tags" : [ "<span class=\"mark\">" ],
    "post_tags" : [ "</span>" ],
    "order" : "score",
    "encoder" : "html",
    "require_field_match" : true,
    "fields" : {    
      "*" : {}
    }
  }
}

query_string contains n fields (in my case about 40 fields), and query contains ~ operator (in the example flight~), takes about 6 seconds.
If I run the same query (flight~), but remove the fields section from query_string section, it takes 10 milliseconds (no time).

Why does it happen?
Is there a solution for this performance penalty (I have to determine specific fields into query_string in order to solve highlight issue, see below)?

Background:
Using elasticsearch 1.6.
Run it on index that contains 1000 documents.
Determine specific fields into query_string in order to solve follow highlight issue:

Thanks,
Effi


(Effi) #2

Sorry, I wrote the times inverted :open_mouth:
Fixed.


(Effi) #3

I opened an issue in github, but it closed saying "Please keep questions on the discuss forum".
It seems like performance issue, is there a way to understand why elasticsearch thinks different?


(Colin Goodheart-Smithe) #4

When you remove the fields section you are searching the _all field, with the fields section you are searching 40 separate fields. Searching and highlighting on 40 fields compared with one field will take much longer. The query phase is going to take longer as there are more fields to search over (and giving that you are using a fuzzy operator the query needs to be expanded into the possible combinations on all 40 fields too), but the main time spent here will be on highlighting. Highlighting is an expensive process so asking Elasticsearch to highlight over 40 fields is going to increase the work required a lot.


(Colin Goodheart-Smithe) #5

There is also this issue which mentions performance issues when using the combination of fuzzy queries and the plain highlighter. It may be worth looking into enabling a different highlighter for your use case


(Effi) #6

Thanks colings86.

Do you understand why should determine specific fields list in the query section in order to have a highlight?
Why not to define fields in the highlight section? Is this highlighter bug?

Do you know a highlighter that handle query that contains fields and text? Will not mix result, like default highlighter when searching on _all field.


(Colin Goodheart-Smithe) #7

I'm sorry but I don't follow your question. Could you explain a bit more?


(system) #8