How to transform indexed fields that appear in UTF-8 to unicode?


(abuameen azzam) #1

Hi,

were using Bro IDS with elastic-search and kafka. however some indexed fields are in arabic but they appear in a utf-8 format like this.

قسÙ
التعيين - إدارة الإختيار والتعيين

is there way for elasticsearch to transform these fields once indexed into a unicode arabic readable fields?

Thanks


(David Turner) #2

This sounds like an encoding issue outside of Elasticsearch. Elasticsearch expects its input to be correctly encoded, and the JSON spec demands JSON to be encoded in UTF-8. It's possible to encode Arabic text just fine in UTF-8, but the system that's putting data into Elasticsearch isn't doing so. For instance I can create a document that looks like this (encoded as UTF-8):

{"id":"12345","content":"الحاسوب"}

In general it's not possible to fix badly-encoded strings (and I don't recognise the encoding bug here so can't say if this particular one is fixable). You're better off avoiding the problem by correctly encoding the data on the way in.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.