Logstash : Parsing fails while parsing Json data received through web service like twitter / facebook etc

Prashant_Agrawal · August 1, 2016, 6:25pm

I am receiving the Web service data say from Twitter and logging to file and there after I need to send that data to Logstash so as same can be indexed to Elasticsearch.

I am using below config and that is giving jsonparsefailure with exception as

JSON parse failure. Falling back to plain-text {:error=>#> LogStash::Json::ParserError: Unexpected character (':' (code 58)): expected a >valid value (number, String, array, object, 'true', 'false' or 'null')
My logstash conf files looks like :

input
{
    file
    {
        path => ["/mnt/volume2/ELK_Prashant/at/events.json"]
        codec => json
        type => json
    start_position => "beginning"
        sincedb_path => "/dev/null"
    }
}
output
{
    stdout { codec => rubydebug }
}

And data in events.json can be reference from API reference index | Docs | Twitter Developer Platform with some sample as below:

events.json

[{
	"coordinates": null,
	"favorited": false,
	"truncated": false,
	"created_at": "Mon Sep 24 03:35:21 +0000 2012",
	"id_str": "250075927172759552",
	"entities": {
		"urls": [

		],
		"hashtags": [{
			"text": "freebandnames",
			"indices": [
				20,
				34
			]
		}],
		"user_mentions": [

		]
	},
	"in_reply_to_user_id_str": null,
	"contributors": null,
	"text": "Aggressive Ponytail #freebandnames",
	"metadata": {
		"iso_language_code": "en",
		"result_type": "recent"
	},
	"retweet_count": 0,
	"in_reply_to_status_id_str": null,
	"id": 250075927172759552,
	"geo": null,
	"retweeted": false,
	"in_reply_to_user_id": null,
	"place": null,
	"user": {
		"profile_sidebar_fill_color": "DDEEF6",
		"profile_sidebar_border_color": "C0DEED",
		"profile_background_tile": false,
		"name": "Sean Cummings",
		"profile_image_url": "http://a0.twimg.com/profile_images/2359746665/1v6zfgqo8g0d3mk7ii5s_normal.jpeg",
		"created_at": "Mon Apr 26 06:01:55 +0000 2010",
		"location": "LA, CA",
		"follow_request_sent": null,
		"profile_link_color": "0084B4",
		"is_translator": false,
		"id_str": "137238150",
		"entities": {
			"url": {
				"urls": [{
					"expanded_url": null,
					"url": "",
					"indices": [
						0,
						0
					]
				}]
			},
			"description": {
				"urls": [

				]
			}
		},
		"default_profile": true,
		"contributors_enabled": false,
		"favourites_count": 0,
		"url": null,
		"profile_image_url_https": "https://si0.twimg.com/profile_images/2359746665/1v6zfgqo8g0d3mk7ii5s_normal.jpeg",
		"utc_offset": -28800,
		"id": 137238150,
		"profile_use_background_image": true,
		"listed_count": 2,
		"profile_text_color": "333333",
		"lang": "en",
		"followers_count": 70,
		"protected": false,
		"notifications": null,
		"profile_background_image_url_https": "https://si0.twimg.com/images/themes/theme1/bg.png",
		"profile_background_color": "C0DEED",
		"verified": false,
		"geo_enabled": true,
		"time_zone": "Pacific Time (US & Canada)",
		"description": "Born 330 Live 310",
		"default_profile_image": false,
		"profile_background_image_url": "http://a0.twimg.com/images/themes/theme1/bg.png",
		"statuses_count": 579,
		"friends_count": 110,
		"following": null,
		"show_all_inline_media": false,
		"screen_name": "sean_cummings"
	},
	"in_reply_to_screen_name": null,
	"in_reply_to_status_id": null
}]

magnusbaeck · August 1, 2016, 6:45pm

If you paste that JSON snippet into e.g. http://jsonlint.com it'll tell you that it isn't valid JSON.

Prashant_Agrawal · August 1, 2016, 6:51pm

@magnusbaeck : Actually I pasted part of json , and missed few brackets.

I have modified the question with proper input message as json which I want to parse

Prashant_Agrawal · August 1, 2016, 6:55pm

Or even a short message which also fails can be looked as
> [{

    	"location": "LA, CA",
    	"follow_request_sent": null,
    	"profile_link_color": "0084B4",
    	"is_translator": false,
    	"id_str": "137238150",
    	"entities": {
    		"url": {
    			"urls": [{
    				"expanded_url": null,
    				"url": ""
    			}]
    		}
    	}
    }]

magnusbaeck · August 1, 2016, 6:59pm

Logstash isn't capable of parsing the kind of JSON without help. You'll have to use a multiline codec to join the lines of each JSON structure into a single event and then use a json filter to parse it. It's probably safe for you to assume that [{ is what each event begin with, in which case what you want to express in the multiline codec is "unless the current line begins with [{, join it with the previous line".

Prashant_Agrawal · August 1, 2016, 7:02pm

But most of the web service (Twitter, Facebook) response does not startsalways with {[ for each docs , rather it has an array for ex say I have doc1 , doc2 then in Web service API I will get as

[
{
doc1
},
{
doc2
}
]

So any clue how we can handle this or is there any tutorial which I can refer to create this one

magnusbaeck · August 1, 2016, 7:05pm

Do you really need to to write the JSON response in pretty-printed form? If you could write each JSON object on a single line then things would be so much easier. If that's impossible, consider some other kind of delimiter for the JSON objects. Perhaps a blank line?

Prashant_Agrawal · August 1, 2016, 7:09pm

Actually we are not formatting and writing the response through our code rather we are accessing some third party API , for example you can take an example of twitter API
https://dev.twitter.com/rest/reference/get/geo/search
which on accessing will give the response in formatted output.

So , isn't there a way or plugin which supports this , or we have to manually write the code and parse the same

magnusbaeck · August 1, 2016, 7:14pm

Yes, I understand that you're using an API but you still have the option of serializing that JSON response any way you like. Deserializing the response and serializing it back without prettyprinting enabled will do. It's probably also safe to just delete all newline characters found in the response string.

I don't forget I mentioned the option of using e.g. a blank line to separate each JSON object.

Prashant_Agrawal · August 1, 2016, 7:22pm

Yeah, I see that...
We will have a look on that and will try with changing the code so as to make it serialize / deserialize .

Thanks for your help and time ...

Christian_Dahlqvist · August 2, 2016, 11:03am

From what I can see you also seem to have an issue with the url field in your messages. At one level it contains and object and within this it contains a string, which would cause a mapping conflict in Elasticsearch. You will therefore probably need to do some processing before indexing it.

Topic		Replies	Views
Getting _jsonparsefailure in logstash Logstash	3	16519	July 6, 2017
Logstash JSON codec Logstash	3	1146	July 6, 2017
Logstash cannot parse the json data sent in filebeat Logstash	1	516	October 24, 2018
Logstash cannot parse JSON file correctly Logstash	1	849	May 20, 2019
Parse Logstash.log with logstash Logstash	2	726	July 6, 2017

Logstash : Parsing fails while parsing Json data received through web service like twitter / facebook etc

Related topics