First of all, sorry for the title. I don't know what should I call that (byte string?).
I have a data that is like this:
b'{"id": "2", "words": ["\\u0633\\u0647\\u0627\\u0645\\u062f\\u0627\\u0631\\u06cc", "\\u0634\\u0631\\u0648\\u0639"], "content": "#\\u0648\\u0644\\u0633\\u0627\\u067e\\u0627"}'
The data is coming from NSQ and it is all I'm told (!). I don't know why it's like this (they said cuz of NSQ output).
Logstash config:
input { stdin {} }
filter {
bytes{
source => "message"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "mydata"
}
}
Analyzer in Kibana:
PUT /text2
{
"settings": {
"analysis": {
"char_filter": {
"zero_width_spaces": {
"type": "mapping",
"mappings": [ "\\u200C=>\\u0020"]
}
},
"filter": {
"persian_stop": {
"type": "stop",
"stopwords": "_persian_"
}
},
"analyzer": {
"rebuilt_persian": {
"tokenizer": "standard",
"char_filter": [ "zero_width_spaces" ],
"filter": [
"asciifolding",
"lowercase",
"decimal_digit",
"arabic_normalization",
"persian_normalization",
"persian_stop"
]
}
}
}
}
}
Unfortunately, it doesn't translate "\u0633..." to Persian alphabet.
But if instead, I set it like this:
filter {
json {
source => "message"
}
}
and change the input to this:
{"id": "2", "words": ["\u0633\u0647\u0627\u0645\u062f\u0627\u0631\u06cc", "\u0634\u0631\u0648\u0639"], "content": "#\u0648\u0644\u0633\u0627\u067e\u0627"}
it works fine.
I don't know what I'm doing wrong, and again sorry I sound like a total newbie.