Hello, in an analyzer conditional token filter, I use a painless script with the regular expression. If a token contains only letters et hypens, then the compound word in the token is splitted, otherwise no.
But my script below doesn't work. If I remove the conditional token filer, I get several words. What's wrong? Thanks.
GET /test/_analyze
{
"tokenizer": "whitespace",
"filter": [
{
"type": "condition",
"filter": ["word_delimiter_graph"],
"script": {
"lang": "painless",
"source": "token.toString() ==~ /^[A-Za-z-]+$/"
}
}
],
"explain": true,
"text": "WORD-IS-SPLITTED"
}
In the response, the word is not splitted:
"tokenfilters" : [
{
"name" : "__anonymous__condition",
"tokens" : [
{
"token" : "WORD-IS-SPLITTED",
"start_offset" : 0,
"end_offset" : 16,
"type" : "word",
"position" : 0,
"bytes" : "[42 4f 49 2d 49 53 2d 47 50 45]",
"keyword" : false,
"positionLength" : 1,
"termFrequency" : 1
}
]
}
]