How to parse json inside text line

Deny7 · October 9, 2019, 9:24am

Hi,

I have following log that looks like this:
1570519737247 I access {"date":"2019-10-08T09:28:57.247","rootTitle":"title1","rootModel":"model1","dcTitle":"[1a]"}
1570519737247 I access {"date":"2019-10-08T09:28:57.247","rootTitle":"title2","rootModel":"model2","dcTitle":"[1b]"}
etc...

You can see on the beggining there is plain text and then valid json.

I want to to parse every line as json object that would look like this:

{
	"beginningText": "1570519737247 I access",
	"message": {
		"date":"2019-10-08T09:28:57.247",
		"rootTitle":"title1",
		"rootModel":"model1",
		"dcTitle":"[1a]"
		}
}

My config:

input {
   tcp {
    port  => 5044
    codec => json
  }
}
filter {
    grok {
		break_on_match => false
		match =>  [ "message", "TID: \[%{TIMESTAMP_ISO8601:timestamp}\] %{LOGLEVEL:level} \[%{JAVACLASS:java_class}\] \(%{GREEDYDATA:thread}\) - (?<log_message>(.|\r|\n)*)"]
    }
	date{
		match => [ "timestamp_from_log", "ISO8601"]
	}  
}

output {
  stdout {}  
  elasticsearch {
	hosts => ["*************"]
	sniffing => true
	manage_template => false
	index => "statistics-%{+YYYY.MM.dd}"
  }
}

But it will end in parsing error:
[ERROR][logstash.codecs.json ][main] JSON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: Unexpected character ('W' (code 87)): Expected space separating root-level values
at [Source: (String)"2W\u0000\u0000\u0000\u00012C\u0000\u0000\u0002\xE7x^\xA4\x93_k\xDCF\u0014\xC5Ý~\x89>\r\u0017\u0002\tÕ\x91v\xBD\xBB\x9E\x97&\r.I\x89\xE3\xB0q[\xA8e\xC2\xF5\xE8J\u001Av4\xA3\xCE\\x99l\xC4~\xF7\xA2ÅÚ´%\x90\u0017......

and in kibana I see values like this: 2W\u0000\u0000\u0000\u00012C\u0000\u0000\u0002\xD9x^\xA4SK\x8B\xDCF\u0010\xDE........

Please can anybody help me with this? Thank you.

Badger · October 9, 2019, 1:53pm

What are you using to send data to logstash?

Deny7 · October 9, 2019, 2:09pm

Filebeat

Badger · October 9, 2019, 2:12pm

Do not use a tcp input, use a beats input.

Deny7 · October 9, 2019, 2:18pm

thanks, that worked, but now it just print whole unparsed line and I would like to have it like this:

{
	"beginningText": "1570519737247",
	"message": {
		"date":"2019-10-08T09:28:57.247",
		"rootTitle":"title1",
		"rootModel":"model1",
		"dcTitle":"[1a]"
		}
}

Can you help with grok part?

Badger · October 9, 2019, 2:23pm

Use a json filter.

Deny7 · October 9, 2019, 7:55pm

I tried this configuration:

filter {	
	mutate {
		rename => ["host", "server"]
		convert => {"server" => "string"}
	}

	json {
		source => "message"	
		skip_on_invalid_json => true
 	}
}

But it just print whole message at one line, I want to have this format:

{
	"beginningText": "1570519737247",
	"message": {
		"date":"2019-10-08T09:28:57.247",
		"rootTitle":"title1",
		"rootModel":"model1",
		"dcTitle":"[1a]"
		}
}

and i want to skip the word "I access". Also documentation say that it takes field which contains json, in my case i dont have such a field. Can u help?

Badger · October 9, 2019, 7:57pm

Can you change the output from stdout {} to

stdout { codec => rubydebug }

and post what an event looks like please. Or if you are using Kibana click on an event and copy from the JSON tab.

Deny7 · October 9, 2019, 8:28pm

there is event from kibana:

{
  "_index": "statistics-2019.10.09",
  "_type": "doc",
  "_id": "**************",
  "_version": 1,
  "_score": null,
  "_source": {
    "server": {
      "os": {
        "family": "redhat",
        "codename": "Core",
        "name": "CentOS Linux",
        "version": "7 (Core)",
        "platform": "centos"
      },
      "containerized": true,
      "name": "************",
      "id": "**************",
      "architecture": "x86_64"
    },
    "beat": {
      "name": "******************",
      "hostname": "***************",
      "version": "6.6.2"
    },
    "@version": "1",
    "prospector": {
      "type": "log"
    },
    "message": "1570651770510 I access {\"date\":\"2019-10-09T22:09:30.51\",\"rootTitle\":\"some text\",\"rootModel\":\"monograph\",\"dcTitle\":\"[1a]\",\"pid\":\"uuid:0909324234\",\"pids_path\":[\"/uuid:0909324234/uuid:0909324234\"],\"rootPid\":\"uuid:0909324234\",\"models_path\":[\"/title/page\"],\"remoteAddr\":\"127.0.0.1\",\"username\":\"not_logged\"}",
    "log": {
      "file": {
        "path": "/home/***********/apache-tomcat-7.0.93/logs/statistics.2019-10-09.log"
      }
    },
    "fields": {
      "log_file": "statistics"
    },
    "input": {
      "type": "log"
    },
    "@timestamp": "2019-10-09T20:09:37.914Z",
    "offset": 8980,
    "tags": [
      "stats",
      "beats_input_codec_plain_applied"
    ],
    "source": "/home/*********/apache-tomcat-7.0.93/logs/statistics.2019-10-09.log"
  },
  "fields": {
    "@timestamp": [
      "2019-10-09T20:09:37.914Z"
    ]
  },
  "sort": [
    1570651777914
  ]
}

Badger · October 9, 2019, 8:34pm

You can use mutate+gsub to remove everything before the opening {

    mutate { gsub => [ "message", "^[^{]+", "" ] }
    json { source => "message" }

will get you

    "rootPid" => "uuid:0909324234",
  "rootTitle" => "some text",
  "rootModel" => "monograph",
       "date" => "2019-10-09T22:09:30.51",
 "remoteAddr" => "127.0.0.1",
    "dcTitle" => "[1a]",

etc.

Deny7 · October 9, 2019, 9:53pm

Thanks, now every value is on one line when i click on event in kibana, but the "message" field is not formatted like json should be:

I want:

{
	"date":"2019-10-08T09:28:57.247",
	"rootTitle":"title1",
	"rootModel":"model1",
	"dcTitle":"[1a]"
     etc....
	}

Is it possible to do that? so when i select only message field in kibana I will see it formatted?

Badger · October 9, 2019, 9:59pm

The json filter I showed should be parsing that [message] field into those other fields. It will not modify the message field. If you click on an event in Kibana and switch to the JSON tab do you see them?

Deny7 · October 9, 2019, 10:09pm

Yes I see them, but I also need the message field formatted.

Badger · October 9, 2019, 11:04pm

Does adding target => "message" to the json filter fix it?

Deny7 · October 10, 2019, 7:47am

It gives me error in kibana:

Kibana log:

{"type":"error","@timestamp":"2019-10-10T07:32:25Z","tags":[],"pid":11247,"level":"error","error":{"message":"[parsing_exception] 
[match_phrase] unknown token [START_OBJECT] after [query], with { line=1 & col=549 }","name":"Error","stack":"[parsing_exception] [match_phrase]
 unknown token [START_OBJECT] after [query], with { line=1 & col=549 } :: {\"path\":\"/_msearch\",\"query\":{\"rest_total_hits_as_int\":\"true\",
 \"ignore_throttled\":\"true\"},\"body\":\"{\\\"index\\\":\\\"statistic*\\\",\\\"ignore_unavailable\\\":true,\\\"preference\\\":1570692736705}\\
 n{\\\"version\\\":true,\\\"size\\\":500,\\\"sort\\\":[{\\\"@timestamp\\\":{\\\"order\\\":\\\"desc\\\",\\\"unmapped_type\\\":\\\"boolean\\\"}}],\\\"_source\\\":{\\\"excludes\\\":[]},\\\"aggs\\\":{\\\"2\\\":{\\\"date_histogram\\\":{\\\"field\\\":\\\"@timestamp\\\",\\\"interval\\\":\\\"3h\\\",\\\"time_zone\\\":\\\"Europe/Berlin\\\",\\\"min_doc_count\\\":1}}},\\\"stored_fields\\\":[\\\"*\\\"],\\\"script_fields\\\":{},\\\"docvalue_fields\\\":[{\\\"field\\\":\\\"@timestamp\\\",\\\"format\\\":\\\"date_time\\\"}],\\\"query\\\":{\\\"bool\\\":{\\\"must\\\":[{\\\"match_all\\\":{}},{\\\"range\\\":{\\\"@timestamp\\\
 ":{\\\"gte\\\":1570312800000,\\\"lte\\\":1570917599999,\\\"format\\\":\\\"epoch_millis\\\"}}}],\\\"filter\\\":[],\\\"should\\\":[],\\\"must_not\\\":[{\\\"match_phrase\\\":{\\\"message\\\":{\\\"query\\\":{\\\"date\\\":\\\"2019-10-10T09:29:40.201\\\",\\\"dcTitle\\\":\\\"[1a]\\\",\\\"models_path\\\":[\\\"/monograph/page\\\"],\\\"pid\\\":\\\"uuid:22c40550-27c1-11e3-b79f-5ef3fc9bb22f\\\",\\\"pids_path\\\":[\\\"/uuid:0174cfe0-1526-11e3-bc65-005056827e51/uuid:22c40550-27c1-11e3-b79f-5ef3fc9bb22f\\\"],\\\"remoteAddr\\\":\\\"127.0.0.1\\\",\\\"rootModel\\\":\\\"monograph\\\",\\\"rootPid\\\":\\\"uuid:0174cfe0-1526-11e3-bc65-005056827e51\\\",
 \\\"rootTitle\\\":\\\"Stalin - horor 20. stolet?\\\",\\\"username\\\":\\\"not_logged\\\"}}}}]}},\\\"highlight\\\":{\\\"pre_tags\\\":[\\\"@kibana-highlighted-field@\\\"],\\\"post_tags\\\":[\\\"@/kibana-highlighted-field@\\\"],\\\"fields\\\":{\\\"*\\\":{}},\\\"fragment_size\\\":2147483647},\\\"timeout\\\":\\\"30000ms\\\"}\\n\",\"statusCode\":400,\"response\":\"{\\\"error\\\":{\\\"root_cause\\\":[{\\\"type\\\":\\\"parsing_exception\\\",\\\"reason\\\":\\\"[match_phrase] unknown token [START_OBJECT] after [query]\\\",\\\"line\\\":1,\\\"col\\\":549}],\\\"type\\\":\\\"parsing_exception\\\",\\\"reason\\\":\\\"[match_phrase] unknown token [START_OBJECT] after [query]\\\",\\\"line\\\":1,\\\"col\\\":549},\\\"status\\\":400}\"}\n    at respond (/usr/share/kibana/node_modules/elasticsearch/src/lib/transport.js:308:15)\n    at checkRespForFailure (/usr/share/kibana/node_modules/elasticsearch/src/lib/transport.js:267:7)\n    at HttpConnector.<anonymous> (/usr/share/kibana/node_modules/elasticsearch/src/lib/connectors/http.js:166:7)\n    at IncomingMessage.wrapper (/usr/share/kibana/node_modules/elasticsearch/node_modules/lodash/lodash.js:4935:19)\n    at IncomingMessage.emit (events.js:194:15)\n    at endReadableNT (_stream_readable.js:1103:12)\n    at process._
 tickCallback (internal/process/next_tick.js:63:19)"},"url":{"protocol":null,"slashes":null,"auth":null,"host":null,"port":null,"hostname":null,"hash":null,"search":"?rest_total_hits_as_int=true&ignore_throttled=true","query":{"rest_total_hits_as_int":"true","ignore_throttled":"true"},"pathname":"/elasticsearch/_msearch","path":"/elasticsearch/_msearch?rest_total_hits_as_int=true&ignore_throttled=true","href":"/elasticsearch/_msearch?rest_total_hits_as_int=true&ignore_throttled=true"},"message":"[parsing_exception] [match_phrase] unknown token [START_OBJECT] after [query], with { line=1 & col=549 }"}

Deny7 · October 10, 2019, 8:13am

Ufff, I returned to previous setting without the target => "message" and it still gives me this error and whole kibana is broken, also deleted this index and restarted elk but doesnt help, pls help

Deny7 · October 10, 2019, 8:38am

this is weird i deleted that one index and it still gives me this error, other indexes are not available too, restarted multiple time, looks like the error i cached or something...

Deny7 · October 10, 2019, 10:05am

ok ok, i resolved this by deleting that filter:D, but still can u help on that format?

system · November 7, 2019, 10:05am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to parse the json string Logstash	2	275	January 3, 2021
Logstash filter to parse json --> json deserialized string Logstash	2	2650	February 5, 2018
Logstash parsing date into JSON Logstash	2	315	January 8, 2019
Parsing json inside json Logstash	3	317	April 26, 2023
Parse nested json Logstash	9	6396	July 6, 2017

How to parse json inside text line

Related topics