Summary
I'm working upgrading from 1.5.2 to 5.2.2 and I've many performance points which changed and working through them I noticed that wildcard queries especially stand out.
FTR: I'm aware how bad wildcard, especial prefix wildcard, queries. It's currently a business requirement and I'm constantly trying to improve the queries and especially I want to get rid of them but it's not yet possible.
My tests were conducted on my local dev VM:
- ES 1.5.2 on OpenJDK 1.7.0_121 with 2GB RAM and 1GB for ES, 2 CPUs
- ES 5.2.2 on Oracle JDK 1.8.0_121 with 3GB RAM and 2GB for ES, 3 CPUs ; I simply increased the RAM on that machine because I realized the java XMX parameter was upped
Both versions were installed from the official ES package repositories on Ubuntu 14.04 LTS.
I tested with siege -c <concurrency> -b -t 1m 'http://localhost:9200/alias/<Type>/_search POST <json payload>
With 1.5.2
Query: { "query": { "filtered": { "query": { "bool": { "should": [ { "wildcard": { "comment.message": { "value": "*someterm*", "boost": 1 } } } ] } } } } }
- concurrency 1 , requests per minute 20968
- concurrency 10 , requests per minute 43165
- concurrency 50 , requests per minute 42811
With 5.2.2
Query adapted: { "query": { "bool": { "filter": [ { "bool": { "should": [ { "wildcard": { "comment.message": { "value": "*someterm*", "boost": 1 } } } ] } } ] } } }
- concurrency 1 , requests per minute 16759
- concurrency 10 , requests per minute 23722
- concurrency 50 , requests per minute 24472
General information
- I tested a few of our application queries for both versions (always adapted and in some cases even optimized the queries) but I noticed that high concurrency always led to slower requests per minute for ES 5.2.2
- Another bottleneck are queries involving parent/child models; but tested is complex and time consuming and I'm not yet done with my results, but also here having the wildcards seems to be a defining factor
- I also tested ES 1.5.2 with Oracle JDK 1.8.0_121 and the numbers were the same
- In practice I'm using wildcard matches on a few fields, not just one
Question
- Why are wildcards slower in my tests?
- What can I do to improve performance, especially regarding increased concurrency?