Cannot get the Length Token Filter to work in a custom anaylzer

Anthony_Chiu · December 18, 2019, 9:10am

I am setting up a filter within my index:

  "min-length-token":{
                "type": "length",
                "min": 2,
   },

then, I add this filter to my anayzer like this:

"extend-standard" : {
              "filter" : [
                "lowercase", "min-length-token"
              ],
              "tokenizer" : "standard"
            },

But when I call the analyzer api with

{
  "analyzer": "extend-standard",
  "text": "a&b"
}

It still returns "a" and "b", which have length 1 only.

spinscale · December 18, 2019, 12:24pm

can you share a fully reproducible example? I tried this in the analyze API and it works.

GET _analyze
{
  "tokenizer": "standard",
  "filter": [
    {
      "type": "length",
      "min": 2
    }
  ],
  "text": "a&b"
}

also, please share your Elasticsearch version.

Thanks!

Anthony_Chiu · December 19, 2019, 1:27am

Thanks for your reply, here is a full sample:

POST t_index/_close

PUT t_index/_settings
{
  "analysis":{
    "filter":{
      "min-length-token":{
        "type": "length",
        "min": 2,
        "max": 512
      }
    },
    "analyzer":{
      "t-standard":{
        "filter":["min-length-token"],
        "type":"standard"
      }
    }
  }
}

POST t_index/_open

GET t_index/_analyze
{
  "analyzer": "t-standard",
  "text": "a&b"
}

The response:

{
  "tokens" : [
    {
      "token" : "a",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "b",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

spinscale · December 19, 2019, 9:56am

try this

PUT t_index
{
  "settings": {
    "analysis": {
      "filter": {
        "min-length-token": {
          "type": "length",
          "min": 2,
          "max": 512
        }
      },
      "analyzer": {
        "t-standard": {
          "filter": [
            "min-length-token"
          ],
          "tokenizer" : "standard",
          "type": "custom"
        }
      }
    }
  }
}

GET t_index/_analyze
{
  "analyzer": "t-standard",
  "text": "a&b"
}

using standard as the analyzer seems to ignore all the other settings. I will take a deeper look and potentially open an issue.

Thanks!

spinscale · December 19, 2019, 10:13am

I opened https://github.com/elastic/elasticsearch/issues/50356 for further discussion if you are interested.

Anthony_Chiu · December 19, 2019, 10:32am

Thanks for the work around, I also agree it is an issue. I got it fixed with standard tokeniser, but I can't get the english work to work.

"t-standard":{
        "filter":["min-length-token"],
        "type":"english",
        "tokenizer": "standard"
      }

system · January 16, 2020, 10:32am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Length Token Filter Elasticsearch	10	1729	July 6, 2017
Completion Suggester ignores Length Token Filter Elasticsearch	1	938	July 5, 2017
Custom analyzer with standard tokenizer is splitting long tokens instead of discarding Elasticsearch	4	1210	July 5, 2017
Pattern analyzer does not respect max_token_length Elasticsearch	2	774	July 5, 2017
Can't use length filter in a custom normalizer. any way around? Elasticsearch	1	457	February 26, 2018

Cannot get the Length Token Filter to work in a custom anaylzer

Related topics