We are running a ES cluster with 4 nodes, 2 shards, 3 indexes. Each index contains about 60Gb of data.
I need to run a query with a complex filter but empty query part (scoring is not needed at all).
The idea behind this query is to filter out documents that are not relevant. To do that we want to apply different rules to the 'content' field. These rules depend on the length of the content. So documents with more content should have more occurrences than documents with a fewer content.
The problem is that these queries execute pretty slow.
I'm wondering is there any performance tips or known limitations for range queries or slop?
Or do you guys see any other problems with this query?
Is there any way to determine what ES is doing to actually run the query.
How can I find the speed bottleneck?
A few more things to notice:
1 - this is an example of 1 entry, but we typically have 5-10 blocks that looks like this all together
2 - When we have 5-10 blocks, queries can take between 3-5 seconds to execute
3 - in looking at explain, the range filters appear to be cached
4 - Running the queries multiple times do not seem to improve speed at all
5 - no extra IO load or CPU load appears in our Marvel metrics.
I didn't want to sound like your suggestions are not useful.
Unfortunately, there is no way for us to upgrade to 2.2 in the near future
and the profile would be telling us the profile in 2.2, not 1.3 query structures.
Hoping someone has an idea as to why these filters would be slow in 1.3
or, if there was any way to backport the profile to 1.3
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.