Another Logstash Json Filter

Mrde · April 12, 2019, 9:08am

Hello,

Sorry to add another topic on this subject but even by reading many posts about this, i didn't succeed to solve my issue : Time to learn from community

Here is an exemple of json input that i don't succeed to parse :

{
    "Meta Data": {
        "1. Information": "Informations (label1, label2, label3, label4)",
        "2. Parameter1": "Value1",
        "3. Parameter2": "Value2",
         "5. Last Refreshed": "2019-04-12 16:45:00",
        "6. Time Zone": "GMT+8"
    },
    "Time Series": {
        "2019-04-12": {
            "1. label1": "1.0000",
            "2. label2": "1.1111",
            "3. label3": "1.2222",
        },
        "2019-04-11": {
            "1. label1": "1.0000",
            "2. label2": "1.1111",
            "3. label3": "1.2222",
        },
        "2019-04-10": {
            "1. label1": "1.0000",
            "2. label2": "1.1111",
            "3. label3": "1.2222",
        }

I tried many different things such as using json_line codec, different multiline pattern, ranaming fields one by one, ...

Thanks for your help,
Regards,
Mark

Badger · April 12, 2019, 12:15pm

What have you tried, and what don't you like about the result?

Mrde · April 12, 2019, 12:43pm

I have tried many different things I found in other topics but there must be something I do not understand.

Here is an example :

input {
	file {
		type => "json" #I also triend json_line
		path => "/home/user/data/test.json"
		codec => multiline {
           pattern => "^\{" #I tried different things here
           negate => "true"
           what => "previous"
		}
		start_position => "beginning"
		sincedb_path => "/dev/null"
	}
filter {
  json { source => "Time Series" }
  date {
    match => [ "[Time Series][*]", "yyyy-MM-dd" ] #I tried many different things here but i don't really know what to put as there is no label
    target => "timestamp"
  }
}
output {
  elasticsearch {
    action => "index"
    hosts => "localhost:9200"
    index => "my_index"
  }
  stdout {
     codec => rubydebug
  }

#I have tried using more or less parameter, tried with Grok filter and tried with ruby (but i'm not an expert at all).

I do not get any result as it print big lines full of errors. (Ex : JSON parse error)
Sometimes I am just not getting the right information in the index, ...

Thanks for your help,
Regards,
Mark

Badger · April 12, 2019, 1:13pm

Do you want to consume the entire file as a single event?

Mrde · April 12, 2019, 3:11pm

I would like to get those informations labeled with the corresponding timestamp.

"2019-04-12": {
            "1. label1": "1.0000",
            "2. label2": "1.1111",
            "3. label3": "1.2222",
        },
        "2019-04-11": {
            "1. label1": "1.0000",
            "2. label2": "1.1111",
            "3. label3": "1.2222",
        },

@Timestamp = 2019-04-12 - Label1 = 1.0000
@Timestamp = 2019-04-12 - Label2 = 1.1111
@Timestamp = 2019-04-12 - Label3 = 1.2222

The goal of this is to chart the evolution of label1, label2 and label3 with 3 different plot

Thanks,
Mark

Badger · April 12, 2019, 4:22pm

Do you want to consume the entire file as a single event, or are there multiple JSON objects in the file?

The JSON that you show is not valid JSON (missing trailing }, and you cannot have , immediately before a }. Is your JSON actually valid, or do you need to mutate it before parsing it?

Mrde · April 15, 2019, 8:24am

Hello,

We can consume it as a single event, there is only one type of object in the file.

My json file is supposed to be correct :
Here is a correct extract of my json file (I forgot to remove one "," and to close with } on the last extract)

"Time Series": {
        "2019-04-12": {
            "1. label1": "1.0000",
            "2. label2": "1.1111",
            "3. label3": "1.2222",
        },
        "2019-04-11": {
            "1. label1": "1.0000",
            "2. label2": "1.1111",
            "3. label3": "1.2222",
        }
}

Sorry for imprecisions of information.

Thanks for help,
Mark

Badger · April 15, 2019, 12:59pm

To consume the entire file as a single event I would use a multiline codec with a pattern that never matches

 multiline { pattern => "^Spalanzani" what => "previous" negate => true auto_flush_interval => 1 }

The first issue we need to fix is that trailing comma on "3. label3". We can do that with a gsub filter. Once that is done we can use a json filter to parse.

    mutate { gsub => [ "message", ",\s+}", "}" ] }
    json { source => "message" target => "someField" remove_field => [ "message" ] }

It's hard to do much with a hash of hashes, but we can use ruby to convert it

    ruby { code => 'event.set("timeSeries", event.get("[someField][Time Series]").to_a)' }

gets us

"timeSeries" => [
    [0] [
        [0] "2019-04-10",
        [1] {
            "2. label2" => "1.1111",
            "3. label3" => "1.2222",
            "1. label1" => "1.0000"
        }
    ],

etc. We can use a split filter on that

 split { field => "timeSeries" }

which gets us three events instead of one. These look like this:

"timeSeries" => [
    [0] "2019-04-11",
    [1] {
        "2. label2" => "1.1111",
        "3. label3" => "1.2222",
        "1. label1" => "1.0000"
    }
],

Using

    mutate { rename => { "[someField][Meta Data]" => "metadata" } remove_field => "someField" }
    date { match => [ "[timeSeries][0]", "YYYY-MM-dd" ] }
    ruby { code => 'event.get("[timeSeries][1]").each { |k, v| event.set(k,v) }' remove_field => timeSeries }

we can change this into

{
"2. label2" => "1.1111",
"@timestamp" => 2019-04-12T04:00:00.000Z,
"tags" => [
[0] "multiline"
],
"metadata" => {
"6. Time Zone" => "GMT+8",
"5. Last Refreshed" => "2019-04-12 16:45:00",
"3. Parameter2" => "Value2",
"2. Parameter1" => "Value1",
"1. Information" => "Informations (label1, label2, label3, label4)"
},
"3. label3" => "1.2222",
"1. label1" => "1.0000"
}

which may not be exactly what you want, but hopefully gives you some ideas to try.

Mrde · April 16, 2019, 2:02pm

Thanks a lot for your answer. I should be able to understand with those inputs

I will keep you in touch and close the subject as soon as I succeed !

guyboertje · April 16, 2019, 2:39pm

To consume the whole file content as a single message field, you can also use read mode and set an impossible delimiter e.g. øåø. The delimiter is never found and so the whole file is read and, at EOF, the event is created from the buffered content. This way you don't need a multiline codec.

system · May 14, 2019, 2:40pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash Filter help - Newbie question Logstash	6	314	February 17, 2022
Logstash filter config to parse JSON Logstash	4	886	July 6, 2017
Getting _jsonparsefailure in logstash Logstash	3	16383	July 6, 2017
Logstash date extraction help Logstash	3	612	December 26, 2016
Json filter fails on negative numbers Logstash	5	235	July 12, 2022

Another Logstash Json Filter

Related Topics