Analyzer conditional token filter with regular expression

Wonder_Garance · January 25, 2023, 3:36pm

Hello, in an analyzer conditional token filter, I use a painless script with the regular expression. If a token contains only letters et hypens, then the compound word in the token is splitted, otherwise no.

But my script below doesn't work. If I remove the conditional token filer, I get several words. What's wrong? Thanks.

GET /test/_analyze
{
  "tokenizer": "whitespace",
  "filter": [
    {
      "type": "condition",
      "filter": ["word_delimiter_graph"],
      "script": {
        "lang": "painless",
        "source": "token.toString() ==~ /^[A-Za-z-]+$/"
      }
    }
  ],
  "explain": true,
  "text": "WORD-IS-SPLITTED"
}

In the response, the word is not splitted:

"tokenfilters" : [
      {
        "name" : "__anonymous__condition",
        "tokens" : [
          {
            "token" : "WORD-IS-SPLITTED",
            "start_offset" : 0,
            "end_offset" : 16,
            "type" : "word",
            "position" : 0,
            "bytes" : "[42 4f 49 2d 49 53 2d 47 50 45]",
            "keyword" : false,
            "positionLength" : 1,
            "termFrequency" : 1
          }
        ]
      }
    ]

RabBit_BR · January 25, 2023, 4:22pm

Hi @Wonder_Garance

Replace token.toString() to token.getTerm().toString()

GET /test/_analyze
{
  "tokenizer": "whitespace",
  "filter": [
    {
      "type": "condition",
      "filter": ["word_delimiter_graph"],
      "script": {
        "lang": "painless",
        "source": "token.getTerm().toString() ==~ /^[A-Za-z-]+$/"
      }
    }
  ],
  "text": "WORD-IS-SPLITTED"
}

Wonder_Garance · January 25, 2023, 4:29pm

Hi @RabBit_BR
It works, thank you!

system · February 22, 2023, 4:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Contitional Analyzer Elasticsearch	3	668	August 3, 2019
Unexpected results in regexp query Elasticsearch	2	460	July 6, 2017
Changing Analyzer behavior for hyphens - suggestions? Elasticsearch	7	12006	July 5, 2017
Help making a script for a predicate_token_filter Elasticsearch	1	320	August 1, 2019
Custom filer to extract some phrases Elasticsearch	2	331	July 6, 2017

Analyzer conditional token filter with regular expression

Related topics