Boolean parameters in token filters are stringified, breaking elasticsearch_dsl analysis comparison

Heya, I'm not sure if this is a bug with elastic itself or with the elasticsearch_dsl python library, so posting here before filing an issue on github.

If you set up a token filter with a boolean parameter, that parameter gets stringified:

PUT test
{
  "settings": {
    "analysis": {
      "filter": {
        "test": {
          "ignore_case": true,
          "type": "stop",
          "stopwords": [
            "h",
            "n",
            "t"
          ]
        }
      },
      "analyzer": {
        "test": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "test"
          ]
        }
      }
    }
  }
}

GET test/_settings
{
  "test" : {
    "settings" : {
      "index" : {
        "number_of_shards" : "1",
        "provided_name" : "test",
        "creation_date" : "1609856872062",
        "analysis" : {
          "filter" : {
            "test" : {
              "ignore_case" : "true",
              "type" : "stop",
              "stopwords" : [
                "h",
                "n",
                "t"
              ]
            }
          },
          "analyzer" : {
            "test" : {
              "filter" : [
                "test"
              ],
              "type" : "custom",
              "tokenizer" : "standard"
            }
          }
        },
        "number_of_replicas" : "1",
        "uuid" : "ebu4gTVqRbW6dEgleJCECA",
        "version" : {
          "created" : "7100099"
        }
      }
    }
  }
}

You'll notice "ignore_case": true, becomes "ignore_case" : "true",.

The filter does work as intended, however, elasticsearch_dsl trips over this type change:

from elasticsearch_dsl import Document, analyzer, token_filter, Text


class ExampleDocument(Document):
    class Index:
        name = "test2"
        using = "es7_default"

    test = Text(
        analyzer=analyzer(
            "test",
            tokenizer="standard",
            filter=[
                token_filter("test", ignore_case=True, type="stop", stopwords=["h", "n", "t"]),
            ],
        )
    )

Running ExampleDocument.init() the first time succeeds, however running it again results in:

/vendor/elasticsearch_dsl/index.py in save(self, using)
    320                     for k in analysis[section]
    321                 ):
--> 322                     raise IllegalOperation(
    323                         "You cannot update analysis configuration on an open index, "
    324                         "you need to close index %s first." % self._name

This error is thrown because, a few lines above, this check is made, and fails:

Where:

existing_analysis.get(section, {}).get(k, None) == "true"
analysis[section][k] == True

So there's an easy workaround: changing ignore_case=True to ignore_case="true". However, it seems like:

  • either elasticsearch_dsl should handle this (de)serialization properly, or
  • elasticsearch itself shouldn't change the type of the boolean parameter

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.