Performance degradation for wildcard queries from 1.5.2 to 5.2.2?

Summary

I'm working upgrading from 1.5.2 to 5.2.2 and I've many performance points which changed and working through them I noticed that wildcard queries especially stand out.

FTR: I'm aware how bad wildcard, especial prefix wildcard, queries. It's currently a business requirement and I'm constantly trying to improve the queries and especially I want to get rid of them but it's not yet possible.

My tests were conducted on my local dev VM:

  • ES 1.5.2 on OpenJDK 1.7.0_121 with 2GB RAM and 1GB for ES, 2 CPUs
  • ES 5.2.2 on Oracle JDK 1.8.0_121 with 3GB RAM and 2GB for ES, 3 CPUs ; I simply increased the RAM on that machine because I realized the java XMX parameter was upped

Both versions were installed from the official ES package repositories on Ubuntu 14.04 LTS.

I tested with siege -c <concurrency> -b -t 1m 'http://localhost:9200/alias/<Type>/_search POST <json payload>

With 1.5.2

Query: { "query": { "filtered": { "query": { "bool": { "should": [ { "wildcard": { "comment.message": { "value": "*someterm*", "boost": 1 } } } ] } } } } }

  • concurrency 1 , requests per minute 20968
  • concurrency 10 , requests per minute 43165
  • concurrency 50 , requests per minute 42811

With 5.2.2

Query adapted: { "query": { "bool": { "filter": [ { "bool": { "should": [ { "wildcard": { "comment.message": { "value": "*someterm*", "boost": 1 } } } ] } } ] } } }

  • concurrency 1 , requests per minute 16759
  • concurrency 10 , requests per minute 23722
  • concurrency 50 , requests per minute 24472

General information

  • I tested a few of our application queries for both versions (always adapted and in some cases even optimized the queries) but I noticed that high concurrency always led to slower requests per minute for ES 5.2.2
  • Another bottleneck are queries involving parent/child models; but tested is complex and time consuming and I'm not yet done with my results, but also here having the wildcards seems to be a defining factor
  • I also tested ES 1.5.2 with Oracle JDK 1.8.0_121 and the numbers were the same
  • In practice I'm using wildcard matches on a few fields, not just one

Question

  • Why are wildcards slower in my tests?
  • What can I do to improve performance, especially regarding increased concurrency?

Don't use wildcards!

Specifically queries starting with * must be avoid

1 Like

No the answer I expected/hoped for, but yes, ultimately you're right.

I've rewritten everything which required wildcards to use ngrams. It bumped my index to 180% of it's size but queries so much faster, no doubt.

Thank you!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.