query_string contains n fields (in my case about 40 fields), and query contains ~ operator (in the example flight~), takes about 6 seconds.
If I run the same query (flight~), but remove the fields section from query_string section, it takes 10 milliseconds (no time).
Why does it happen?
Is there a solution for this performance penalty (I have to determine specific fields into query_string in order to solve highlight issue, see below)?
Background:
Using elasticsearch 1.6.
Run it on index that contains 1000 documents.
Determine specific fields into query_string in order to solve follow highlight issue:
I opened an issue in github, but it closed saying "Please keep questions on the discuss forum".
It seems like performance issue, is there a way to understand why elasticsearch thinks different?
When you remove the fields section you are searching the _all field, with the fields section you are searching 40 separate fields. Searching and highlighting on 40 fields compared with one field will take much longer. The query phase is going to take longer as there are more fields to search over (and giving that you are using a fuzzy operator the query needs to be expanded into the possible combinations on all 40 fields too), but the main time spent here will be on highlighting. Highlighting is an expensive process so asking Elasticsearch to highlight over 40 fields is going to increase the work required a lot.
There is also this issue which mentions performance issues when using the combination of fuzzy queries and the plain highlighter. It may be worth looking into enabling a different highlighter for your use case
Do you understand why should determine specific fields list in the query section in order to have a highlight?
Why not to define fields in the highlight section? Is this highlighter bug?
Do you know a highlighter that handle query that contains fields and text? Will not mix result, like default highlighter when searching on _all field.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.