Pattern_replace char filter regex

I think the issue is the use of the standard tokenizer, which removes the hyphen before the char_filter gets the chance.

Instead, you could use something like the whitespace tokenizer:

PUT test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "whitespace",
          "char_filter": [
            "my_char_filter"
          ]
        }
      } ,
      "char_filter": {
        "my_char_filter": {
          "type": "pattern_replace",
          "pattern":"(\\w+)-(\\w+)",
          "replacement": "$1$2"
        }
      }
    }
  }
}
1 Like