How to parse the multiline json file through logstash

sharath3185 · September 19, 2016, 7:57am

Hi,

I have a json file with below multiline json format:

[
{
"status": "fail",
"executiontime": 1117,
"errormsg": "dummy error1",
"testname": "test1",
"errorcode": 0,
"signalcode": 0
},
{
"status": "pass",
"executiontime": 1111,
"errormsg": "Dummy error2",
"testname": "test2",
"errorcode": 0,
"signalcode": 0
},
{
"status": "fail",
"executiontime": 1155,
"errormsg": "Dummy error3",
"testname": "test3",
"errorcode": 0,
"signalcode": 0
}
]

I am using grok pattern to fetch the fields and index them to elasticsearch.
My conf file looks something like below:

#An input plugin enables a specific source of events to be read by Logstash.
input
{
file
{
codec => multiline {
pattern => "^\s\s\s\s}"
negate => true
what => previous
max_lines => 20000
}
path => [path/to//abc.json"]
start_position => "beginning"
sincedb_path => "/dev/null"
type => "test"
ignore_older => 0
}
}

filter
{
if [type] == "test"
{

grok
 {
        match => [
        'message' , '%{GREEDYDATA}"status": "%{GREEDYDATA:status}", \r\n\s+"executiontime": %{GREEDYDATA:exectime}, \r\n\s+"errormsg": "%{GREEDYDATA:error}", \r\n\s+"testname": "%{GREEDYDATA:testname}", \r\n\s+"errorcode": %{GREEDYDATA:errorcode}, \r\n\s+"signalcode": %{GREEDYDATA:signalcode}\r%{GREEDYDATA}'
        ]
    }
	if "_jsonparsefailure" in [tags]
    {
        drop{}
    }
 
    if "_grokparsefailure" in [tags] 
    {
        drop {}
    }
    else 
    {
	 mutate
        {
            gsub => ["message", "\r\n", ""]
            remove_field => [  "message", "@version", "path",  "host", tags]
           
        }
	}
	ruby{
	code => "
	event['exectime'] = event['exectime'].to_i;
	event['signalcode'] = event['signalcode'].to_i;
	event['errorcode'] = event['errorcode'].to_i;
	"
	}
	}

}

output
{
if [type] == "test"
{
stdout
{
codec => rubydebug
}

}

This works fine with the above pattern.
But the fields in the json may not be in the same order when generated.
For example: "errorcode", "signalcode" can appear at the top, testname can appear at the 3rd place as below:

{
"errorcode": 0,
"signalcode": 0,
"testname": "test1",
"status": "pass",
"executiontime": 1111,
"errormsg": "StaleElementReferenceException"
}

I this case the grok pattern which I am using in my config file above will not work.
Is there any way that I can handle the above condition?

Looking for help ASAP.

eperry · September 19, 2016, 12:25pm

Look at the json filter

https://www.elastic.co/guide/en/logstash/current/plugins-filters-json.html

sharath3185 · September 19, 2016, 12:28pm

Initially I tried with json filter. But it did not work for me for multiline json.

eperry · September 19, 2016, 12:40pm

How about splitting the field to multiple documents?

https://www.elastic.co/guide/en/logstash/current/plugins-filters-split.html

Or at least mutate split which you can then interact on each piece of the document
https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html#plugins-filters-mutate-split

sharath3185 · September 20, 2016, 8:41am

What i am looking is:

My result.json file has below content:

[
{
"program id" : "1",
"id" : "aaa",
"status" : "PASSED",
"PauseTime" : "0",
"testname" : "test1",
"last update" : "2016-09-16 20:11:56",
"start" : "2016-09-16 14:06:08",
"status id" : "2"
},
{
"program id" : "2",
"id" : "bbb",
"status" : "PASSED",
"PauseTime" : "0",
"last update" : "2016-09-16 20:13:32",
"start" : "2016-09-16 20:13:08",
"status id" : "2",
"testname" : "test2"
}
]

If you observe here: the ket, values are not in same order. In the first pattern, testname is the 5 field, but in the second, testname is the last field.

If I use grok pattern as shown in the above post, it will not work out as the key, value are not in a fixed place.

I want to index these data in a type in elasticsearch as 2 different documents with the above fields present in each document.something like:

index:test,
type: testdata,
_id: 1,
_source: {"testname":"test1", "program id" : "1", "id" : "aaa","status" : "PASSED","last update" : "2016-09-16 20:11:56", "start" : "2016-09-16 20:13:08"}

index:test,
type:testdata,
_id:2,
_source: {"testname":"test2", "program id" : "2", "id" : "bbb","status" : "PASSED","last update" : "2016-09-16 20:13:32", "start" : "2016-09-16 20:13:08"}

How do I achieve this?

Mahdy_S · November 21, 2016, 9:34am

For solving the multiline problem you can use the multiline codec:
https://www.elastic.co/guide/en/logstash/current/plugins-codecs-multiline.html

So what you need to do is to define the quotation marks as being a part of the same line (in the pattern field). What this codec would do is to place all fields in a single line and it will insert a new line only when facing characters at the beginning of the line that do not exist in the pattern field.
From your data I assume you want to brake the line only at the opening curley bracket { so I would put the quotation marks and the closing curley brackets in the pattern field. I believe you need to place the multiline codec before the json filter.

txisme · November 21, 2016, 12:55pm

why dont use a grok filter to store both the field string and the value. And then store the value in elasticsearch depending on the string that precedes it. Something like:

%{WORD:field1}%{WORD:value1}

if field1=="status" {
add field status with value1
}

Topic		Replies	Views
Json of varying length and multiline Logstash	17	1239	August 26, 2019
Multiline JSON file not ingested with logstash to ElasticSearch Logstash	2	1430	May 14, 2019
Configure log file, into json format Logstash	6	7200	July 6, 2017
Mixed JSON input and Other Log files with Multiline codec Logstash	4	1832	July 6, 2017
Logstash error for the multiline json file Logstash	9	2985	March 21, 2018

How to parse the multiline json file through logstash

Related topics