This sounds like an encoding issue outside of Elasticsearch. Elasticsearch expects its input to be correctly encoded, and the JSON spec demands JSON to be encoded in UTF-8. It's possible to encode Arabic text just fine in UTF-8, but the system that's putting data into Elasticsearch isn't doing so. For instance I can create a document that looks like this (encoded as UTF-8):
{"id":"12345","content":"الحاسوب"}
In general it's not possible to fix badly-encoded strings (and I don't recognise the encoding bug here so can't say if this particular one is fixable). You're better off avoiding the problem by correctly encoding the data on the way in.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.