Speeding up elastic search regex filters/query optimization

abhijith_reddy · November 10, 2015, 6:08pm

We are currently using regex filters to support contains query for an application, as expected the performance is pretty abysmal since the regex doesn't have any leading prefix. Below is an example query that we are using

[root@machine ~]# time curl -XGET 'localhost:9200/items_search/_count?routing=123&pretty' -d '{
   "query" : {
      "filtered" : {
         "filter" : {
            "bool" : {
                  "must" : [
                    { "term"  : { "cat_id" : "123"}},
                    { "term"  : { "availability"       : "in stock"}},
                    { "regexp": { "category.lowercase" : ".*women .*"}}
                  ]
           }
         }
      }
   }
}'
{
  "count" : 11323,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  }
}

real    0m41.518s
user    0m0.005s
sys 0m0.001s

If I change the regex to "women .*" the query returns within a couple of seconds. One thing to note is that the shard that this query is getting routed to is around 80 GB.
I understand that the right way to do this would be use ngram analyzers on the fields that we want have contains on which would speed up search queries, however we currently have over 3.5 billion documents in our index and the number of queries that use regex are very small so changing the analyzers for the field (currently the field is not analyzed) would hurt our indexing rate.
Are there any work around for this ? Any pointers or resources would be much appreciated.

Thanks

warkolm · November 11, 2015, 6:44am

Regexp is slow, but you're basically saying you want to check all category.lowercase fields for anything that has the word women in it, which means you have to parse the entire field for every document.
The women .* search is a little better as you only check the start of the field.

You are going to be better off creating a specific field to mention the value you are after.

Topic		Replies	Views
How to optimize regexp filter Elasticsearch	6	812	November 20, 2020
Regex on results only Elasticsearch	5	1052	July 5, 2017
Understanding regexp query better to avoid query failures and OOMs Elasticsearch	1	887	July 6, 2017
Improving search query time Elasticsearch	6	413	January 5, 2021
Minimise dataset for regexp query Elasticsearch	3	707	July 5, 2017

Speeding up elastic search regex filters/query optimization

Related topics