Another Logstash Json Filter

Hello,

Sorry to add another topic on this subject but even by reading many posts about this, i didn't succeed to solve my issue : Time to learn from community :slight_smile:

Here is an exemple of json input that i don't succeed to parse :

{
    "Meta Data": {
        "1. Information": "Informations (label1, label2, label3, label4)",
        "2. Parameter1": "Value1",
        "3. Parameter2": "Value2",
         "5. Last Refreshed": "2019-04-12 16:45:00",
        "6. Time Zone": "GMT+8"
    },
    "Time Series": {
        "2019-04-12": {
            "1. label1": "1.0000",
            "2. label2": "1.1111",
            "3. label3": "1.2222",
        },
        "2019-04-11": {
            "1. label1": "1.0000",
            "2. label2": "1.1111",
            "3. label3": "1.2222",
        },
        "2019-04-10": {
            "1. label1": "1.0000",
            "2. label2": "1.1111",
            "3. label3": "1.2222",
        }

I tried many different things such as using json_line codec, different multiline pattern, ranaming fields one by one, ...

Thanks for your help,
Regards,
Mark

What have you tried, and what don't you like about the result?

I have tried many different things I found in other topics but there must be something I do not understand.

Here is an example :

input {
	file {
		type => "json" #I also triend json_line
		path => "/home/user/data/test.json"
		codec => multiline {
           pattern => "^\{" #I tried different things here
           negate => "true"
           what => "previous"
		}
		start_position => "beginning"
		sincedb_path => "/dev/null"
	}
filter {
  json { source => "Time Series" }
  date {
    match => [ "[Time Series][*]", "yyyy-MM-dd" ] #I tried many different things here but i don't really know what to put as there is no label
    target => "timestamp"
  }
}
output {
  elasticsearch {
    action => "index"
    hosts => "localhost:9200"
    index => "my_index"
  }
  stdout {
     codec => rubydebug
  }

#I have tried using more or less parameter, tried with Grok filter and tried with ruby (but i'm not an expert at all).

I do not get any result as it print big lines full of errors. (Ex : JSON parse error)
Sometimes I am just not getting the right information in the index, ...

Thanks for your help,
Regards,
Mark

Do you want to consume the entire file as a single event?

I would like to get those informations labeled with the corresponding timestamp.

"2019-04-12": {
            "1. label1": "1.0000",
            "2. label2": "1.1111",
            "3. label3": "1.2222",
        },
        "2019-04-11": {
            "1. label1": "1.0000",
            "2. label2": "1.1111",
            "3. label3": "1.2222",
        },

@Timestamp = 2019-04-12 - Label1 = 1.0000
@Timestamp = 2019-04-12 - Label2 = 1.1111
@Timestamp = 2019-04-12 - Label3 = 1.2222

The goal of this is to chart the evolution of label1, label2 and label3 with 3 different plot :slight_smile:

Thanks,
Mark

Do you want to consume the entire file as a single event, or are there multiple JSON objects in the file?

The JSON that you show is not valid JSON (missing trailing }, and you cannot have , immediately before a }. Is your JSON actually valid, or do you need to mutate it before parsing it?

Hello,

We can consume it as a single event, there is only one type of object in the file.

My json file is supposed to be correct :
Here is a correct extract of my json file (I forgot to remove one "," and to close with } on the last extract)

"Time Series": {
        "2019-04-12": {
            "1. label1": "1.0000",
            "2. label2": "1.1111",
            "3. label3": "1.2222",
        },
        "2019-04-11": {
            "1. label1": "1.0000",
            "2. label2": "1.1111",
            "3. label3": "1.2222",
        }
}

Sorry for imprecisions of information.

Thanks for help,
Mark

To consume the entire file as a single event I would use a multiline codec with a pattern that never matches

 multiline { pattern => "^Spalanzani" what => "previous" negate => true auto_flush_interval => 1 }

The first issue we need to fix is that trailing comma on "3. label3". We can do that with a gsub filter. Once that is done we can use a json filter to parse.

    mutate { gsub => [ "message", ",\s+}", "}" ] }
    json { source => "message" target => "someField" remove_field => [ "message" ] }

It's hard to do much with a hash of hashes, but we can use ruby to convert it

    ruby { code => 'event.set("timeSeries", event.get("[someField][Time Series]").to_a)' }

gets us

"timeSeries" => [
    [0] [
        [0] "2019-04-10",
        [1] {
            "2. label2" => "1.1111",
            "3. label3" => "1.2222",
            "1. label1" => "1.0000"
        }
    ],

etc. We can use a split filter on that

 split { field => "timeSeries" }

which gets us three events instead of one. These look like this:

"timeSeries" => [
    [0] "2019-04-11",
    [1] {
        "2. label2" => "1.1111",
        "3. label3" => "1.2222",
        "1. label1" => "1.0000"
    }
],

Using

    mutate { rename => { "[someField][Meta Data]" => "metadata" } remove_field => "someField" }
    date { match => [ "[timeSeries][0]", "YYYY-MM-dd" ] }
    ruby { code => 'event.get("[timeSeries][1]").each { |k, v| event.set(k,v) }' remove_field => timeSeries }

we can change this into

{
"2. label2" => "1.1111",
"@timestamp" => 2019-04-12T04:00:00.000Z,
"tags" => [
[0] "multiline"
],
"metadata" => {
"6. Time Zone" => "GMT+8",
"5. Last Refreshed" => "2019-04-12 16:45:00",
"3. Parameter2" => "Value2",
"2. Parameter1" => "Value1",
"1. Information" => "Informations (label1, label2, label3, label4)"
},
"3. label3" => "1.2222",
"1. label1" => "1.0000"
}

which may not be exactly what you want, but hopefully gives you some ideas to try.

1 Like

Thanks a lot for your answer. I should be able to understand with those inputs :slight_smile:

I will keep you in touch and close the subject as soon as I succeed !

To consume the whole file content as a single message field, you can also use read mode and set an impossible delimiter e.g. ΓΈΓ₯ΓΈ. The delimiter is never found and so the whole file is read and, at EOF, the event is created from the buffered content. This way you don't need a multiline codec.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.