Minimise dataset for regexp query

hrathore · December 18, 2015, 9:49am

I am running a regexp query that starts from (.) and it runs on filetext field for a specific employeeId.
The searched text can be anywhere inside the token, so I am wrapping the regex with .

Index Size: 100GB
Mapping for Index is: employeeId, filetext
Search Type: Scan
Scroll=5m

{
  "query": {
    "filtered" : {
      "query" : { "regexp" : { "filetext" : ".*1234.*" }},
      "filter" : {
        "bool" : {
          "must": { "term" : { "employeeId" : 5725 }},
        }
      }
    }
  } 
}

This query scans the full index instead of searching data specific to that employeeId (confirmed as the full disk is read, as per IO stats), and as it is a regex query, it's very slow.

How do I make sure that query runs only on specific employeeId, and not on complete dataset?
How do I speed this up?

cbuescher · December 18, 2015, 11:21am

Hi,

I assume this is Elasticsearch 1.7, since the filtered query has been deprecated in 2.0. Have you tried moving the regex query after the term query in the must section of the bool-filter like this? I haven't tested this with a large number of documents, but the must clauses should be executed in order, so the term-filter should reduce the number of documents the regexp-query runs on.

"query" : {
    "filtered" : {
      "filter" : {
        "bool" : {
          "must": [
            { "term" : { "employeeId" : 5725 }},
            { "regexp" : { "filetext" : ".*1234.*" }}
        ]}
      }
    }
  }

hrathore · December 22, 2015, 5:35am

@Christoph
I am using version 1.5.2.
I have tried the query you suggested, but it didn't make any difference. The new query took the same time as earlier, and searched the whole index instead of the employeeId filter.

Does it change in version 2.0 ?

Topic		Replies	Views
How to optimize regexp filter Elasticsearch	6	817	November 20, 2020
Speeding up elastic search regex filters/query optimization Elasticsearch	2	2517	July 5, 2017
Understanding regexp query better to avoid query failures and OOMs Elasticsearch	1	887	July 6, 2017
Regex + phrase search Elasticsearch	5	2781	January 9, 2018
Help: Elasticsearch Regexp query Elasticsearch	7	1458	December 3, 2020

Minimise dataset for regexp query

Related topics