Terms aggregation with include ".*x.*" times out


#1

Hello everybody,

The goal I'm trying to achieve is to aggregate on all words which contain a specific character. Query example:

  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "must": [
            {"match_phrase": {"word.ngrams": "sex"}}
          ]
        }
      }
    }
  },
  "aggs": {
    "clusters": {
      "terms": {
        "field": "word",
        "include": ".*?.*", 
        "size": 100
      }
    }
  },
  "size": 0
}

It works fine if the include parameter contains regex in a form: ".*?" or "?.*", but when searching from both sides ".?.", the request tends to time out (error: Gateway Time-out). I guess too much resources are used for processing such a query.

Is there any other way to achieve this?

I also tried using regexp, as an aggregation, it does return the total amount of hits, but not a list of keywords that actually contain the searched character. Example also below:

  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "must": [
            {"match_phrase": {"word.ngrams": "sex"}}
          ]
        }
      }
    }
  },
  "aggs": {
    "regex_query": {
      "filter": {
        "regexp": {
          "word": {
            "value": ".*?.*"
          }
        }
      }
    }
  },
  "size": 0
}

(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.