Another Logstash Json Filter

Hello,

Sorry to add another topic on this subject but even by reading many posts about this, i didn't succeed to solve my issue : Time to learn from community :slight_smile:

Here is an exemple of json input that i don't succeed to parse :

{
    "Meta Data": {
        "1. Information": "Informations (label1, label2, label3, label4)",
        "2. Parameter1": "Value1",
        "3. Parameter2": "Value2",
         "5. Last Refreshed": "2019-04-12 16:45:00",
        "6. Time Zone": "GMT+8"
    },
    "Time Series": {
        "2019-04-12": {
            "1. label1": "1.0000",
            "2. label2": "1.1111",
            "3. label3": "1.2222",
        },
        "2019-04-11": {
            "1. label1": "1.0000",
            "2. label2": "1.1111",
            "3. label3": "1.2222",
        },
        "2019-04-10": {
            "1. label1": "1.0000",
            "2. label2": "1.1111",
            "3. label3": "1.2222",
        }

I tried many different things such as using json_line codec, different multiline pattern, ranaming fields one by one, ...

Thanks for your help,
Regards,
Mark

What have you tried, and what don't you like about the result?

I have tried many different things I found in other topics but there must be something I do not understand.

Here is an example :

input {
	file {
		type => "json" #I also triend json_line
		path => "/home/user/data/test.json"
		codec => multiline {
           pattern => "^\{" #I tried different things here
           negate => "true"
           what => "previous"
		}
		start_position => "beginning"
		sincedb_path => "/dev/null"
	}
filter {
  json { source => "Time Series" }
  date {
    match => [ "[Time Series][*]", "yyyy-MM-dd" ] #I tried many different things here but i don't really know what to put as there is no label
    target => "timestamp"
  }
}
output {
  elasticsearch {
    action => "index"
    hosts => "localhost:9200"
    index => "my_index"
  }
  stdout {
     codec => rubydebug
  }

#I have tried using more or less parameter, tried with Grok filter and tried with ruby (but i'm not an expert at all).

I do not get any result as it print big lines full of errors. (Ex : JSON parse error)
Sometimes I am just not getting the right information in the index, ...

Thanks for your help,
Regards,
Mark

Do you want to consume the entire file as a single event?

I would like to get those informations labeled with the corresponding timestamp.

"2019-04-12": {
            "1. label1": "1.0000",
            "2. label2": "1.1111",
            "3. label3": "1.2222",
        },
        "2019-04-11": {
            "1. label1": "1.0000",
            "2. label2": "1.1111",
            "3. label3": "1.2222",
        },

@Timestamp = 2019-04-12 - Label1 = 1.0000
@Timestamp = 2019-04-12 - Label2 = 1.1111
@Timestamp = 2019-04-12 - Label3 = 1.2222

The goal of this is to chart the evolution of label1, label2 and label3 with 3 different plot :slight_smile:

Thanks,
Mark

Do you want to consume the entire file as a single event, or are there multiple JSON objects in the file?

The JSON that you show is not valid JSON (missing trailing }, and you cannot have , immediately before a }. Is your JSON actually valid, or do you need to mutate it before parsing it?

Hello,

We can consume it as a single event, there is only one type of object in the file.

My json file is supposed to be correct :
Here is a correct extract of my json file (I forgot to remove one "," and to close with } on the last extract)

"Time Series": {
        "2019-04-12": {
            "1. label1": "1.0000",
            "2. label2": "1.1111",
            "3. label3": "1.2222",
        },
        "2019-04-11": {
            "1. label1": "1.0000",
            "2. label2": "1.1111",
            "3. label3": "1.2222",
        }
}

Sorry for imprecisions of information.

Thanks for help,
Mark

To consume the entire file as a single event I would use a multiline codec with a pattern that never matches

 multiline { pattern => "^Spalanzani" what => "previous" negate => true auto_flush_interval => 1 }

The first issue we need to fix is that trailing comma on "3. label3". We can do that with a gsub filter. Once that is done we can use a json filter to parse.

    mutate { gsub => [ "message", ",\s+}", "}" ] }
    json { source => "message" target => "someField" remove_field => [ "message" ] }

It's hard to do much with a hash of hashes, but we can use ruby to convert it

    ruby { code => 'event.set("timeSeries", event.get("[someField][Time Series]").to_a)' }

gets us

"timeSeries" => [
    [0] [
        [0] "2019-04-10",
        [1] {
            "2. label2" => "1.1111",
            "3. label3" => "1.2222",
            "1. label1" => "1.0000"
        }
    ],

etc. We can use a split filter on that

 split { field => "timeSeries" }

which gets us three events instead of one. These look like this:

"timeSeries" => [
    [0] "2019-04-11",
    [1] {
        "2. label2" => "1.1111",
        "3. label3" => "1.2222",
        "1. label1" => "1.0000"
    }
],

Using

    mutate { rename => { "[someField][Meta Data]" => "metadata" } remove_field => "someField" }
    date { match => [ "[timeSeries][0]", "YYYY-MM-dd" ] }
    ruby { code => 'event.get("[timeSeries][1]").each { |k, v| event.set(k,v) }' remove_field => timeSeries }

we can change this into

{
"2. label2" => "1.1111",
"@timestamp" => 2019-04-12T04:00:00.000Z,
"tags" => [
[0] "multiline"
],
"metadata" => {
"6. Time Zone" => "GMT+8",
"5. Last Refreshed" => "2019-04-12 16:45:00",
"3. Parameter2" => "Value2",
"2. Parameter1" => "Value1",
"1. Information" => "Informations (label1, label2, label3, label4)"
},
"3. label3" => "1.2222",
"1. label1" => "1.0000"
}

which may not be exactly what you want, but hopefully gives you some ideas to try.

1 Like

Thanks a lot for your answer. I should be able to understand with those inputs :slight_smile:

I will keep you in touch and close the subject as soon as I succeed !

To consume the whole file content as a single message field, you can also use read mode and set an impossible delimiter e.g. øåø. The delimiter is never found and so the whole file is read and, at EOF, the event is created from the buffered content. This way you don't need a multiline codec.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.