Jan Ramón is indexed as JanRamón

Hi all,
I noticed that Jan Ramón (any diacritic values in general) is indexed as JanRamón by omitting the space. how do i avoid this?
This the filter i used.

"filter": {
"my_ascii_folding": {
"type": "asciifolding",
"preserve_original": "true"
}
},

and i use whitespace tokenizer.

-Thanks

Can you please provide a script to recreate this? That would make it a lot easier to see what is happening.

HI @Christian_Dahlqvist,
I was able to figure out what the problem was.
I used a script to decode base64 and then index to elastic. in script i removed all the white spaces, that was the problem.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.