We are running a ES cluster with 4 nodes, 2 shards, 3 indexes. Each index contains about 60Gb of data.
I need to run a query with a complex filter but empty query part (scoring is not needed at all).
The idea behind this query is to filter out documents that are not relevant. To do that we want to apply different rules to the 'content' field. These rules depend on the length of the content. So documents with more content should have more occurrences than documents with a fewer content.
The problem is that these queries execute pretty slow.
I'm wondering is there any performance tips or known limitations for range queries or slop?
Or do you guys see any other problems with this query?
Is there any way to determine what ES is doing to actually run the query.
How can I find the speed bottleneck?
A few more things to notice:
1 - this is an example of 1 entry, but we typically have 5-10 blocks that looks like this all together
2 - When we have 5-10 blocks, queries can take between 3-5 seconds to execute
3 - in looking at explain, the range filters appear to be cached
4 - Running the queries multiple times do not seem to improve speed at all
5 - no extra IO load or CPU load appears in our Marvel metrics.