Too_many_clauses but only with ip field type and ip_range search values?

Hello,

We are running into a too_many_clauses failure when using a terms query inside a filter context, but only when the target field has type ip and when the search values are CIDRs.

Here's an example:

PUT /test-index
{
  "mappings": {
    "properties": {
      "addressIp": {"type": "ip"},
      "addressString": {"type": "keyword"}
    }
  }
}

# this query executes successfully because the target field is of type keyword
GET /test-index/_search
{
  "query": {
    "bool": {
      "must_not": {
        "terms": {
          "addressString": [
            "15.234.110.68/32",
            "233.30.77.1/32",
            "37.13.158.110/32",
            # ... plus over 1024 additional CIDR addresses generated from a random IP generator
          ]
        }
      }
    }
  }
}

# this query fails with too_many_clauses because the target field is of type ip
GET /test-index/_search
{
  "query": {
    "bool": {
      "must_not": {
        "terms": {
          "addressIp": [
            "15.234.110.68/32",
            "233.30.77.1/32",
            "37.13.158.110/32",
            # ... plus over 1024 additional CIDR addresses generated from a random IP generator
          ]
        }
      }
    }
  }
}

As you can see, too_many_clauses is not being hit by a field of type keyword but is being hit by a field of type ip. In both cases, the values passed in are CIDRs. I see a Github issue closed in 2017 which is quite similar to my case, with the important distinction being that that issue tested for cases where the values passed in are ip addresses, as opposed to CIDRs: Execution of terms query in filter context returns too_many_clauses exception for ip fileds type. · Issue #25667 · elastic/elasticsearch · GitHub

Would anyone be able to provide insight into why this happens with CIDR parameters only? I'm wondering whether it's an unavoidable implementation detail or it's a bug which can be fixed.

I'm unable to get past this limitation by increasing max clause count because I believe max clause count is not a user-modifiable setting in clusters managed by elastic.co.

It would be possible to partition this query and pass in less than 1024 values each time, but since my goal is to exclude documents with must_not, my client code would have to combine the resultsets by taking their set intersection, and that's an unclean solution I would like to avoid, if possible.

The versions we've tried this with are Elasticsearch versions 7.12.1 and 7.5.1.

Thanks in advance for any input!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.