I'm trying to design a token filter that will split words, but will not
split on dollar signs ("$"). (I'm actually trying to do something more
complicated, but I am stuck on this step).
I'm attempting to use a word_delimiter filter to do this. The "type_table"
item in the JSON configuration does not seem to be honored by
elasticsearch. Below is how I am creating the index:
curl -X POST http://localhost:9200/venues -d '{
"mappings": {
"venue": {
"properties": {
"id": {
"type": "string"
},
"name": {
"type": "string",
"analyzer": "name_analyzer"
}
}
}
},
"settings": {
"analysis": {
"analyzer": {
"name_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"split_words_except_dollar"
]
}
},
"filter": {
"split_words_except_dollar": {
"type": "word_delimiter",
"type_table": {
"$": "ALPHANUM"
}
}
}
}
}
}'
(note, I also have tried it with type_table being an array of objects,
rather than just an object, with the same result.
I am testing it with the following call:
curl -XGET
'localhost:9200/venues/_analyze?pretty=true&analyzer=name_analyzer' -d
'ke$ha'
I expect the output of the analyzer to be "ke$ha", but instead it emits
two tokens: "ke" & "ha".
What am I doing wrong here? I can't find an example anywhere of using a
custom type_table, and the correct syntax.
Thank you,
Jacob