Cannot get the Length Token Filter to work in a custom anaylzer

I am setting up a filter within my index:

  "min-length-token":{
                "type": "length",
                "min": 2,
   },

then, I add this filter to my anayzer like this:

"extend-standard" : {
              "filter" : [
                "lowercase", "min-length-token"
              ],
              "tokenizer" : "standard"
            },

But when I call the analyzer api with

{
  "analyzer": "extend-standard",
  "text": "a&b"
}

It still returns "a" and "b", which have length 1 only.

can you share a fully reproducible example? I tried this in the analyze API and it works.

GET _analyze
{
  "tokenizer": "standard",
  "filter": [
    {
      "type": "length",
      "min": 2
    }
  ],
  "text": "a&b"
}

also, please share your Elasticsearch version.

Thanks!

1 Like

Thanks for your reply, here is a full sample:

POST t_index/_close

PUT t_index/_settings
{
  "analysis":{
    "filter":{
      "min-length-token":{
        "type": "length",
        "min": 2,
        "max": 512
      }
    },
    "analyzer":{
      "t-standard":{
        "filter":["min-length-token"],
        "type":"standard"
      }
    }
  }
}

POST t_index/_open

GET t_index/_analyze
{
  "analyzer": "t-standard",
  "text": "a&b"
}

The response:

{
  "tokens" : [
    {
      "token" : "a",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "b",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

try this

PUT t_index
{
  "settings": {
    "analysis": {
      "filter": {
        "min-length-token": {
          "type": "length",
          "min": 2,
          "max": 512
        }
      },
      "analyzer": {
        "t-standard": {
          "filter": [
            "min-length-token"
          ],
          "tokenizer" : "standard",
          "type": "custom"
        }
      }
    }
  }
}

GET t_index/_analyze
{
  "analyzer": "t-standard",
  "text": "a&b"
}

using standard as the analyzer seems to ignore all the other settings. I will take a deeper look and potentially open an issue.

Thanks!

1 Like

I opened https://github.com/elastic/elasticsearch/issues/50356 for further discussion if you are interested.

1 Like

Thanks for the work around, I also agree it is an issue. I got it fixed with standard tokeniser, but I can't get the english work to work.

"t-standard":{
        "filter":["min-length-token"],
        "type":"english",
        "tokenizer": "standard"
      }

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.