How to transform indexed fields that appear in UTF-8 to unicode?

Hi,

were using Bro IDS with elastic-search and kafka. however some indexed fields are in arabic but they appear in a utf-8 format like this.

قسÙ
التعيين - إدارة الإختيار والتعيين

is there way for elasticsearch to transform these fields once indexed into a unicode arabic readable fields?

Thanks

This sounds like an encoding issue outside of Elasticsearch. Elasticsearch expects its input to be correctly encoded, and the JSON spec demands JSON to be encoded in UTF-8. It's possible to encode Arabic text just fine in UTF-8, but the system that's putting data into Elasticsearch isn't doing so. For instance I can create a document that looks like this (encoded as UTF-8):

{"id":"12345","content":"الحاسوب"}

In general it's not possible to fix badly-encoded strings (and I don't recognise the encoding bug here so can't say if this particular one is fixable). You're better off avoiding the problem by correctly encoding the data on the way in.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.