Instead of migrating es 5.6 indices to es 6 I wanted to recreate them. But somehow the behaviour of synonym filter has changed.
The symptom is a message like "Invalid synonym rule at line 3" when creating the index and I cannot see what's wrong with that line 3:
launisch,launische,launischem,launischen,launischer,launisches
abend-make-up,abend-make-ups
ueberich,ueber-ich,ueber-ichs,ueberichs
ehrliebend,ehrliebende,ehrliebendem,ehrliebenden,ehrliebender,ehrliebendes
Of course it is not just this line; my synonym-list is quite big and I would have to change it by trial and error.
So I prefer to know, what is different in 6.0. I have seen parameters "tokenizer" and "ignore_case" are deprecated, but I anyway did not make use of them in 5.6.
Added later: I experimented a little bit more: I think I drop the entries with "-" because the are handled as 2 words after my tokenization. But then I ran into the next trap: It looks like previous version did not bother (or complain) about inconsistencies, i.e. same synonym in different lines with different 1st entry.
Added later: Found out that if I remove the stopword filter containing terms also present in the synonym-list lets the create index call run without error. I found that behaviour strange; I thought different filters in a series work indepently on what is presentet at input.
Anybody with experiences in that area?
Any hint is appreciated very much.
Thanks, regards, Jürg
my analyzer which uses the synonym-filter looks like
"expand": {
"type": "custom",
"char_filter": [
"komischeZeichen"
],
"filter": [
"lowercase",
"morph"
],
"tokenizer": "standard"
}
my filter morph:
"morph": {
"expand": true,
"type": "synonym",
"synonyms" : [
"launisch,launische,launischem,launischen,launischer,launisches",
"abend-make-up,abend-make-ups",
"ueberich,ueber-ich,ueber-ichs,ueberichs",
"ehrliebend,ehrliebende,ehrliebendem,ehrliebenden,ehrliebender,ehrliebendes"
]
}
the charfilter:
"komischeZeichen": {
"type": "mapping",
"mappings": [
"'=>,", "'=>,", "´=>,", "`=>,", "’=>,", "Œ=>OE", "œ=>oe", "¡=>i", "À=>A", "Á=>A", "Â=>A", "Ã=>A", "Ä=>Ae", "Å=>A", "Æ=>AE", "Ç=>C", "È=>E", "É=>E", "Ê=>E", "Ë=>E", "Ì=>I", "Í=>I", "Î=>I", "Ï=>IIII", "Ð=>D", "Ñ=>N", "Ò=>O", "Ó=>O", "Ô=>O", "Õ=>O", "Ö=>Oe", "Ù=>U", "Ú=>U", "Û=>U", "Ü=>Ue", "Ý=>Y", "ß=>ss", "à=>a", "á=>a", "â=>a", "ã=>a", "ä=>ae", "å=>a", "æ=>ae", "ç=>c", "è=>e", "é=>e", "ê=>e", "ë=>e", "ì=>i", "í=>i", "î=>i", "ï=>iiii", "ð=>d", "ñ=>n", "ò=>o", "ó=>o", "ô=>o", "õ=>o", "ö=>oe", "ù=>u", "ú=>u", "û=>u", "ü=>ue", "ý=>y", "ÿ=>y", "Ā=>A", "ā=>a", "Ă=>A", "ă=>a", "Ą=>A", "ą=>a", "Ć=>C", "ć=>c", "Ĉ=>C", "ĉ=>c", "Ċ=>C", "ċ=>c", "Č=>C", "č=>c", "Ď=>D", "ď=>d", "Đ=>D", "đ=>d", "Ē=>E", "ē=>e", "Ĕ=>E", "ĕ=>e", "Ė=>E", "ė=>e", "Ę=>E", "ę=>e", "Ě=>E", "ě=>e", "Ĝ=>G", "ĝ=>g", "Ğ=>G", "ğ=>g", "Ġ=>G", "ġ=>g", "Ģ=>G", "ģ=>g", "Ĥ=>H", "ĥ=>h", "Ħ=>H", "ħ=>h", "Ĩ=>I", "ĩ=>i", "Ī=>I", "ī=>i", "Ĭ=>I", "ĭ=>i", "Į=>I", "į=>i", "İ=>I", "ı=>i", "IJ=>IJ", "ij=>ij", "Ĵ=>J", "ĵ=>j", "Ķ=>K", "ķ=>k", "ĸ=>K", "Ĺ=>L", "ĺ=>l", "Ļ=>L", "ļ=>l", "Ľ=>L", "ľ=>l", "Ŀ=>L", "ŀ=>l", "Ł=>L", "ł=>l", "Ń=>N", "ń=>n", "Ņ=>N", "ņ=>n", "Ň=>N", "ň=>n", "ʼn=>n", "Ŋ=>N", "ŋ=>n", "Ō=>O", "ō=>o", "Ŏ=>O", "ŏ=>o", "Ő=>O", "ő=>o", "Ŕ=>R", "ŕ=>r", "Ŗ=>R", "ŗ=>r", "Ř=>R", "ř=>r", "Ś=>S", "ś=>s", "Ŝ=>S", "ŝ=>s", "Ş=>S", "ş=>s", "Š=>S", "š=>s", "Ţ=>T", "ţ=>t", "Ť=>T", "ť=>t", "Ŧ=>T", "ŧ=>t", "Ũ=>U", "ũ=>u", "Ū=>U", "ū=>u", "Ŭ=>U", "ŭ=>u", "Ů=>U", "ů=>u", "Ű=>U", "ű=>u", "Ų=>U", "ų=>u", "Ŵ=>W", "ŵ=>w", "Ŷ=>Y", "ŷ=>y", "Ÿ=>Y", "Ź=>Z", "ź=>z", "Ż=>Z", "ż=>z", "Ž=>Z", "ž=>z", "Þ=>th", "Ø=>O", "þ=>Th", "ø=>o"
]
}