Thanks for the quick suggestions,
I tried this method on my side, but it didn't work for me
curl -X PUT 'http://localhost:9200/admin/?pretty=true' -d '
{
"settings" : {
"analysis" : {
"analyzer" : {
"artist_analyzer" : {
"tokenizer" : "standard",
"filter" : ["standard", "lowercase",
"artist_metaphone", "asciifolding", "synonym"]
}
},
"filter" : {
"artist_metaphone" : {
"type" : "phonetic",
"encoder" : "metaphone",
"replace" : false
},
"synonym" : {
"type" : "synonym",
"synonyms" : [
"kesha => ke$ha",
"!!! => !!! (chk chk chk)"
]
}
}
}
}
}
'
Am i doing wrong something.
Any suggestions will be highly appreciated.
On Thursday, October 31, 2013 4:16:54 PM UTC+5:30, Jörg Prante wrote:
The names you are looking for are name entities. Each entity can have
variant spellings, such as Kesha and Ke$ha.
Libraries solve this challenge by using name authority files. For example,
the entity of Kesha is 81878968 and under this URL,
dereferenced to an URI, you can find the variant names, even at an
international scope.
To detect such name entities, a special token filter would be required.
Advanced ones are name entity recognizer (NER) with a large knowledge base.
The standard tokenizers handle '$' and '!!!' as word delimiting
characters. If it is feasible, you can create synonyms of all the words you
want to treat as an exception, and set up a synonym filter.
Jörg
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.