Logstash : Parsing fails while parsing Json data received through web service like twitter / facebook etc

I am receiving the Web service data say from Twitter and logging to file and there after I need to send that data to Logstash so as same can be indexed to Elasticsearch.

I am using below config and that is giving jsonparsefailure with exception as

JSON parse failure. Falling back to plain-text {:error=>#> LogStash::Json::ParserError: Unexpected character (':' (code 58)): expected a >valid value (number, String, array, object, 'true', 'false' or 'null')
My logstash conf files looks like :

input
{
    file
    {
        path => ["/mnt/volume2/ELK_Prashant/at/events.json"]
        codec => json
        type => json
    start_position => "beginning"
        sincedb_path => "/dev/null"
    }
}
output
{
    stdout { codec => rubydebug }
}

And data in events.json can be reference from API reference index | Docs | Twitter Developer Platform with some sample as below:

events.json

[{
	"coordinates": null,
	"favorited": false,
	"truncated": false,
	"created_at": "Mon Sep 24 03:35:21 +0000 2012",
	"id_str": "250075927172759552",
	"entities": {
		"urls": [

		],
		"hashtags": [{
			"text": "freebandnames",
			"indices": [
				20,
				34
			]
		}],
		"user_mentions": [

		]
	},
	"in_reply_to_user_id_str": null,
	"contributors": null,
	"text": "Aggressive Ponytail #freebandnames",
	"metadata": {
		"iso_language_code": "en",
		"result_type": "recent"
	},
	"retweet_count": 0,
	"in_reply_to_status_id_str": null,
	"id": 250075927172759552,
	"geo": null,
	"retweeted": false,
	"in_reply_to_user_id": null,
	"place": null,
	"user": {
		"profile_sidebar_fill_color": "DDEEF6",
		"profile_sidebar_border_color": "C0DEED",
		"profile_background_tile": false,
		"name": "Sean Cummings",
		"profile_image_url": "http://a0.twimg.com/profile_images/2359746665/1v6zfgqo8g0d3mk7ii5s_normal.jpeg",
		"created_at": "Mon Apr 26 06:01:55 +0000 2010",
		"location": "LA, CA",
		"follow_request_sent": null,
		"profile_link_color": "0084B4",
		"is_translator": false,
		"id_str": "137238150",
		"entities": {
			"url": {
				"urls": [{
					"expanded_url": null,
					"url": "",
					"indices": [
						0,
						0
					]
				}]
			},
			"description": {
				"urls": [

				]
			}
		},
		"default_profile": true,
		"contributors_enabled": false,
		"favourites_count": 0,
		"url": null,
		"profile_image_url_https": "https://si0.twimg.com/profile_images/2359746665/1v6zfgqo8g0d3mk7ii5s_normal.jpeg",
		"utc_offset": -28800,
		"id": 137238150,
		"profile_use_background_image": true,
		"listed_count": 2,
		"profile_text_color": "333333",
		"lang": "en",
		"followers_count": 70,
		"protected": false,
		"notifications": null,
		"profile_background_image_url_https": "https://si0.twimg.com/images/themes/theme1/bg.png",
		"profile_background_color": "C0DEED",
		"verified": false,
		"geo_enabled": true,
		"time_zone": "Pacific Time (US & Canada)",
		"description": "Born 330 Live 310",
		"default_profile_image": false,
		"profile_background_image_url": "http://a0.twimg.com/images/themes/theme1/bg.png",
		"statuses_count": 579,
		"friends_count": 110,
		"following": null,
		"show_all_inline_media": false,
		"screen_name": "sean_cummings"
	},
	"in_reply_to_screen_name": null,
	"in_reply_to_status_id": null
}]

If you paste that JSON snippet into e.g. http://jsonlint.com it'll tell you that it isn't valid JSON.

1 Like

@magnusbaeck : Actually I pasted part of json , and missed few brackets.

I have modified the question with proper input message as json which I want to parse

Or even a short message which also fails can be looked as
> [{

    	"location": "LA, CA",
    	"follow_request_sent": null,
    	"profile_link_color": "0084B4",
    	"is_translator": false,
    	"id_str": "137238150",
    	"entities": {
    		"url": {
    			"urls": [{
    				"expanded_url": null,
    				"url": ""
    			}]
    		}
    	}
    }]

Logstash isn't capable of parsing the kind of JSON without help. You'll have to use a multiline codec to join the lines of each JSON structure into a single event and then use a json filter to parse it. It's probably safe for you to assume that [{ is what each event begin with, in which case what you want to express in the multiline codec is "unless the current line begins with [{, join it with the previous line".

But most of the web service (Twitter, Facebook) response does not startsalways with {[ for each docs , rather it has an array for ex say I have doc1 , doc2 then in Web service API I will get as

[
{
doc1
},
{
doc2
}
]

So any clue how we can handle this or is there any tutorial which I can refer to create this one

Do you really need to to write the JSON response in pretty-printed form? If you could write each JSON object on a single line then things would be so much easier. If that's impossible, consider some other kind of delimiter for the JSON objects. Perhaps a blank line?

Actually we are not formatting and writing the response through our code rather we are accessing some third party API , for example you can take an example of twitter API
https://dev.twitter.com/rest/reference/get/geo/search
which on accessing will give the response in formatted output.

So , isn't there a way or plugin which supports this , or we have to manually write the code and parse the same

Yes, I understand that you're using an API but you still have the option of serializing that JSON response any way you like. Deserializing the response and serializing it back without prettyprinting enabled will do. It's probably also safe to just delete all newline characters found in the response string.

I don't forget I mentioned the option of using e.g. a blank line to separate each JSON object.

Yeah, I see that...
We will have a look on that and will try with changing the code so as to make it serialize / deserialize .

Thanks for your help and time ... :slight_smile:

From what I can see you also seem to have an issue with the url field in your messages. At one level it contains and object and within this it contains a string, which would cause a mapping conflict in Elasticsearch. You will therefore probably need to do some processing before indexing it.