Pattern_filter for removing dots from a number

Hi everyone,

I am using elasticsearch version 5.5.2 .
I using GET /_analyze API to test a pattern_replace filter.
I just need to remove all dots from a number. What I am trying to do is:

GET /_analyze
{
     "char_filter":[{"type": "pattern_replace", "pattern":"[.]+","replacement":""}],
     "text":"22400.6545421.54541.6545"
}

The response is

{
    "tokens": [
        {
            "token": "22400.6545421.54541.6545",
            "start_offset": 1,
            "end_offset": 25,
            "type": "<NUM>",
            "position": 0
        }
    ]
}

The regExp seems to be correct, according to
http://rubular.com/
http://www.regexplanet.com/advanced/java/index.html

Do I need to specify any tokenizer? If I do so, I got

GET /_analyze
{
     "tokenizer":"standard",
     "char_filter":[{"type": "pattern_replace", "pattern":"[.]+","replacement":""}],
     "text":"22400.6545421.54541.6545"
}

Exception

{
    "error": {
        "root_cause": [
            {
                "type": "remote_transport_exception",
                "reason": "[uaWbxRq][127.0.0.1:9300][indices:admin/analyze[s]]"
            }
        ],
        "type": "illegal_argument_exception",
        "reason": "failed to find global token filter under [[{\"type\": \"pattern_replace\"]"
    },
    "status": 400
}

What am I doing wrong? Why aren't the dots being removed?

Thanks a lot,

Guilherme

try

GET /_analyze
{
  "char_filter": [
    {
      "type": "pattern_replace",
      "pattern": "\\.",
      "replacement": ""
    }
  ],
  "tokenizer": "keyword", 
  "text": "22400.6545421.54541.6545"
}

The docs state in a warning (I suppose that's what you hit)

tokenizer is mandatory when using char_filter or filter. If it is not set, the filters are ignored and the default analyzer is used.

--Alex

Thank you, @spinscale, but as said before, if I set the tokenizer parameter, I got 400 exception.

{
    "error": {
        "root_cause": [
            {
                "type": "remote_transport_exception",
                "reason": "[uaWbxRq][127.0.0.1:9300][indices:admin/analyze[s]]"
            }
        ],
        "type": "illegal_argument_exception",
        "reason": "failed to find global token filter under [[{\"type\": \"pattern_replace\"]"
    },
    "status": 400
} 

I'm using Postman as my client to test the requests.

I suppose I'm passing the parameters in a wrong way, isn't it?

Why do I get that??

"type": "illegal_argument_exception",
"reason": "failed to find global token filter under [[{\"type\": \"pattern_replace\"]"

Thank you very much,

Guilherme

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.