Issue with posting json data to elastic search via Flume


(deepakas) #1

I am using Flume to post data to ElasticSearch. When the data is xml it
works fine. But when the data is json it is loading incorrect data in the
message field.I checked the flume code. It looks like the issue may be with
the elasticsearch code XContentBuilder.

Instead of JSON message it is storing message data as
"org.elasticsearch.common.xcontent.XContentBuilder@32f4122e"

I tried using the DynamicSerializer and
ElasticSearchIndexRequestBuilderFactory but no luck.

Here is the sample data posted on Elastic Search when the data loaded is
JSON.

{

"_index": "test_flume-2014-04-07",

"_type": "logs",

"_id": "M9E-33RQTy2kA6QhW6mSUw",

"_score": null,

"_source": {

"@message": "org.elasticsearch.common.xcontent.XContentBuilder@58bf76d2",

"@timestamp": "2014-04-07T09:49:26.490Z",

"@fields": {

"timestamp": "1396864166490"

}

},

"sort": [

1396864166490,

1396864166490

]

--
Deepak Subhramanian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CA%2BUubig41Nk%3DDkXK3H5WL8ceKHrEp%2B8%2BtQvFDDTq9Lz0Sa5ohA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Brian Yoder) #2

Deepak,

The output you see is likely calling the XContentBuilder.toString method.

The output you want (the JSON string) is obtained by calling the
XContentBuilder.string method instead.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7649ebf5-254b-4411-9d90-99bde1941208%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(esbium) #3

Deepak, any update on whether or not the suggestion worked for you?


(saad373) #4

Hi,

Can you please suggest where to add this code. We have setup ES sink in flume.conf and data is pused to ElasticSearch.


(deepakas) #5

I was on vacation . I am looking into it now. Thanks.


(deepakas) #6

The temporary fix helped. I can now see the JSON message instead of seeing data as ""org.elasticsearch.common.xcontent.XContentBuilder@32f4122e" .
There is a JIRA also on the issue. https://issues.apache.org/jira/browse/FLUME-2126

From Hive when I load JSON data it automatically splits JSON fields to different columns. For some reason the ESSink doesnt load in the same way. I am not sure if I am setting the correct type. There is a parameter es.input.json I have to set to true in hive table . Is there any similar variable I have to set for ESSink .I set some headers as application/json. But no luck. Here is the data I am getting in KIbana.

{
"_index": "test-2014-05-08",
"_type": "parsed_logs",
"_id": "7qSBgRx-Q_GLaCDWARs_Cg",
"_score": null,
"_source": {
"@message": "{"action":{"id":"00001"}}",
"@timestamp": "2014-05-08T16:48:44.180Z",
"@type": "application/json",
"@fields": {
"_attachment_mimetype": "application/json",
"timestamp": "1399567724180",
"_type": "application/json",
"type": "application/json"
}
},
"sort": [
1399567724180
]
}


(system) #7