How to parse json inside text line

Hi,

I have following log that looks like this:
1570519737247 I access {"date":"2019-10-08T09:28:57.247","rootTitle":"title1","rootModel":"model1","dcTitle":"[1a]"}
1570519737247 I access {"date":"2019-10-08T09:28:57.247","rootTitle":"title2","rootModel":"model2","dcTitle":"[1b]"}
etc...

You can see on the beggining there is plain text and then valid json.

I want to to parse every line as json object that would look like this:

{
	"beginningText": "1570519737247 I access",
	"message": {
		"date":"2019-10-08T09:28:57.247",
		"rootTitle":"title1",
		"rootModel":"model1",
		"dcTitle":"[1a]"
		}
}

My config:

input {
   tcp {
    port  => 5044
    codec => json
  }
}
filter {
    grok {
		break_on_match => false
		match =>  [ "message", "TID: \[%{TIMESTAMP_ISO8601:timestamp}\] %{LOGLEVEL:level} \[%{JAVACLASS:java_class}\] \(%{GREEDYDATA:thread}\) - (?<log_message>(.|\r|\n)*)"]
    }
	date{
		match => [ "timestamp_from_log", "ISO8601"]
	}  
}

output {
  stdout {}  
  elasticsearch {
	hosts => ["*************"]
	sniffing => true
	manage_template => false
	index => "statistics-%{+YYYY.MM.dd}"
  }
}

But it will end in parsing error:
[ERROR][logstash.codecs.json ][main] JSON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: Unexpected character ('W' (code 87)): Expected space separating root-level values
at [Source: (String)"2W\u0000\u0000\u0000\u00012C\u0000\u0000\u0002\xE7x^\xA4\x93_k\xDCF\u0014\xC5݇~\x89>\r\u0017\u0002\t՟\x91v\xBD\xBB\x9E\x97&\r.I\x89\xE3\xB0q[\xA8e\xC2\xF5\xE8J\u001Av4\xA3\xCE\\x99l\xC4~\xF7\xA2ņڴ%\x90\u0017......

and in kibana I see values like this: 2W\u0000\u0000\u0000\u00012C\u0000\u0000\u0002\xD9x^\xA4SK\x8B\xDCF\u0010\xDE........

Please can anybody help me with this? Thank you.

What are you using to send data to logstash?

Filebeat

Do not use a tcp input, use a beats input.

thanks, that worked, but now it just print whole unparsed line and I would like to have it like this:

{
	"beginningText": "1570519737247",
	"message": {
		"date":"2019-10-08T09:28:57.247",
		"rootTitle":"title1",
		"rootModel":"model1",
		"dcTitle":"[1a]"
		}
}

Can you help with grok part?

Use a json filter.

I tried this configuration:

filter {	
	mutate {
		rename => ["host", "server"]
		convert => {"server" => "string"}
	}

	json {
		source => "message"	
		skip_on_invalid_json => true
 	}
}

But it just print whole message at one line, I want to have this format:

{
	"beginningText": "1570519737247",
	"message": {
		"date":"2019-10-08T09:28:57.247",
		"rootTitle":"title1",
		"rootModel":"model1",
		"dcTitle":"[1a]"
		}
}

and i want to skip the word "I access". Also documentation say that it takes field which contains json, in my case i dont have such a field. Can u help?

Can you change the output from stdout {} to

stdout { codec => rubydebug }

and post what an event looks like please. Or if you are using Kibana click on an event and copy from the JSON tab.

there is event from kibana:

{
  "_index": "statistics-2019.10.09",
  "_type": "doc",
  "_id": "**************",
  "_version": 1,
  "_score": null,
  "_source": {
    "server": {
      "os": {
        "family": "redhat",
        "codename": "Core",
        "name": "CentOS Linux",
        "version": "7 (Core)",
        "platform": "centos"
      },
      "containerized": true,
      "name": "************",
      "id": "**************",
      "architecture": "x86_64"
    },
    "beat": {
      "name": "******************",
      "hostname": "***************",
      "version": "6.6.2"
    },
    "@version": "1",
    "prospector": {
      "type": "log"
    },
    "message": "1570651770510 I access {\"date\":\"2019-10-09T22:09:30.51\",\"rootTitle\":\"some text\",\"rootModel\":\"monograph\",\"dcTitle\":\"[1a]\",\"pid\":\"uuid:0909324234\",\"pids_path\":[\"/uuid:0909324234/uuid:0909324234\"],\"rootPid\":\"uuid:0909324234\",\"models_path\":[\"/title/page\"],\"remoteAddr\":\"127.0.0.1\",\"username\":\"not_logged\"}",
    "log": {
      "file": {
        "path": "/home/***********/apache-tomcat-7.0.93/logs/statistics.2019-10-09.log"
      }
    },
    "fields": {
      "log_file": "statistics"
    },
    "input": {
      "type": "log"
    },
    "@timestamp": "2019-10-09T20:09:37.914Z",
    "offset": 8980,
    "tags": [
      "stats",
      "beats_input_codec_plain_applied"
    ],
    "source": "/home/*********/apache-tomcat-7.0.93/logs/statistics.2019-10-09.log"
  },
  "fields": {
    "@timestamp": [
      "2019-10-09T20:09:37.914Z"
    ]
  },
  "sort": [
    1570651777914
  ]
}

You can use mutate+gsub to remove everything before the opening {

    mutate { gsub => [ "message", "^[^{]+", "" ] }
    json { source => "message" }

will get you

    "rootPid" => "uuid:0909324234",
  "rootTitle" => "some text",
  "rootModel" => "monograph",
       "date" => "2019-10-09T22:09:30.51",
 "remoteAddr" => "127.0.0.1",
    "dcTitle" => "[1a]",

etc.

Thanks, now every value is on one line when i click on event in kibana, but the "message" field is not formatted like json should be:

I want:

{
	"date":"2019-10-08T09:28:57.247",
	"rootTitle":"title1",
	"rootModel":"model1",
	"dcTitle":"[1a]"
     etc....
	}

Is it possible to do that? so when i select only message field in kibana I will see it formatted?

The json filter I showed should be parsing that [message] field into those other fields. It will not modify the message field. If you click on an event in Kibana and switch to the JSON tab do you see them?

Yes I see them, but I also need the message field formatted.

Does adding target => "message" to the json filter fix it?

It gives me error in kibana:

Kibana log:

{"type":"error","@timestamp":"2019-10-10T07:32:25Z","tags":[],"pid":11247,"level":"error","error":{"message":"[parsing_exception] 
[match_phrase] unknown token [START_OBJECT] after [query], with { line=1 & col=549 }","name":"Error","stack":"[parsing_exception] [match_phrase]
 unknown token [START_OBJECT] after [query], with { line=1 & col=549 } :: {\"path\":\"/_msearch\",\"query\":{\"rest_total_hits_as_int\":\"true\",
 \"ignore_throttled\":\"true\"},\"body\":\"{\\\"index\\\":\\\"statistic*\\\",\\\"ignore_unavailable\\\":true,\\\"preference\\\":1570692736705}\\
 n{\\\"version\\\":true,\\\"size\\\":500,\\\"sort\\\":[{\\\"@timestamp\\\":{\\\"order\\\":\\\"desc\\\",\\\"unmapped_type\\\":\\\"boolean\\\"}}],\\\"_source\\\":{\\\"excludes\\\":[]},\\\"aggs\\\":{\\\"2\\\":{\\\"date_histogram\\\":{\\\"field\\\":\\\"@timestamp\\\",\\\"interval\\\":\\\"3h\\\",\\\"time_zone\\\":\\\"Europe/Berlin\\\",\\\"min_doc_count\\\":1}}},\\\"stored_fields\\\":[\\\"*\\\"],\\\"script_fields\\\":{},\\\"docvalue_fields\\\":[{\\\"field\\\":\\\"@timestamp\\\",\\\"format\\\":\\\"date_time\\\"}],\\\"query\\\":{\\\"bool\\\":{\\\"must\\\":[{\\\"match_all\\\":{}},{\\\"range\\\":{\\\"@timestamp\\\
 ":{\\\"gte\\\":1570312800000,\\\"lte\\\":1570917599999,\\\"format\\\":\\\"epoch_millis\\\"}}}],\\\"filter\\\":[],\\\"should\\\":[],\\\"must_not\\\":[{\\\"match_phrase\\\":{\\\"message\\\":{\\\"query\\\":{\\\"date\\\":\\\"2019-10-10T09:29:40.201\\\",\\\"dcTitle\\\":\\\"[1a]\\\",\\\"models_path\\\":[\\\"/monograph/page\\\"],\\\"pid\\\":\\\"uuid:22c40550-27c1-11e3-b79f-5ef3fc9bb22f\\\",\\\"pids_path\\\":[\\\"/uuid:0174cfe0-1526-11e3-bc65-005056827e51/uuid:22c40550-27c1-11e3-b79f-5ef3fc9bb22f\\\"],\\\"remoteAddr\\\":\\\"127.0.0.1\\\",\\\"rootModel\\\":\\\"monograph\\\",\\\"rootPid\\\":\\\"uuid:0174cfe0-1526-11e3-bc65-005056827e51\\\",
 \\\"rootTitle\\\":\\\"Stalin - horor 20. stolet?\\\",\\\"username\\\":\\\"not_logged\\\"}}}}]}},\\\"highlight\\\":{\\\"pre_tags\\\":[\\\"@kibana-highlighted-field@\\\"],\\\"post_tags\\\":[\\\"@/kibana-highlighted-field@\\\"],\\\"fields\\\":{\\\"*\\\":{}},\\\"fragment_size\\\":2147483647},\\\"timeout\\\":\\\"30000ms\\\"}\\n\",\"statusCode\":400,\"response\":\"{\\\"error\\\":{\\\"root_cause\\\":[{\\\"type\\\":\\\"parsing_exception\\\",\\\"reason\\\":\\\"[match_phrase] unknown token [START_OBJECT] after [query]\\\",\\\"line\\\":1,\\\"col\\\":549}],\\\"type\\\":\\\"parsing_exception\\\",\\\"reason\\\":\\\"[match_phrase] unknown token [START_OBJECT] after [query]\\\",\\\"line\\\":1,\\\"col\\\":549},\\\"status\\\":400}\"}\n    at respond (/usr/share/kibana/node_modules/elasticsearch/src/lib/transport.js:308:15)\n    at checkRespForFailure (/usr/share/kibana/node_modules/elasticsearch/src/lib/transport.js:267:7)\n    at HttpConnector.<anonymous> (/usr/share/kibana/node_modules/elasticsearch/src/lib/connectors/http.js:166:7)\n    at IncomingMessage.wrapper (/usr/share/kibana/node_modules/elasticsearch/node_modules/lodash/lodash.js:4935:19)\n    at IncomingMessage.emit (events.js:194:15)\n    at endReadableNT (_stream_readable.js:1103:12)\n    at process._
 tickCallback (internal/process/next_tick.js:63:19)"},"url":{"protocol":null,"slashes":null,"auth":null,"host":null,"port":null,"hostname":null,"hash":null,"search":"?rest_total_hits_as_int=true&ignore_throttled=true","query":{"rest_total_hits_as_int":"true","ignore_throttled":"true"},"pathname":"/elasticsearch/_msearch","path":"/elasticsearch/_msearch?rest_total_hits_as_int=true&ignore_throttled=true","href":"/elasticsearch/_msearch?rest_total_hits_as_int=true&ignore_throttled=true"},"message":"[parsing_exception] [match_phrase] unknown token [START_OBJECT] after [query], with { line=1 & col=549 }"}

Ufff, I returned to previous setting without the target => "message" and it still gives me this error and whole kibana is broken, also deleted this index and restarted elk but doesnt help, pls help

this is weird i deleted that one index and it still gives me this error, other indexes are not available too, restarted multiple time, looks like the error i cached or something...

ok ok, i resolved this by deleting that filter:D, but still can u help on that format?