Filtering special characters in search results

Hello everyone,

We use currently ES to index documents with the mapper-attachments plugin.
Sometimes, files that we index can contain some special characters, such as
special symbols (ex : cellphone symbol) added in word.
In our case, special characters have nothing to do with language, but only
graphical symbol in word.

Finally, the search results will return things like
\n� 000 123 456 \n

Since indexed files are encoded in base64 and stored directly in ES without
any copy, I don't think we shall filter the files before its storage in ES.
(Otherwise we can't retrieve the same document as it was before indexing)

Maybe we should try to clean the search result by eliminating these
unreadable characters.

Do you have some ideas please?

Thank you very much.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cd988a84-2c64-425b-b522-86f74311390e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.