Json input and splitting

Hello, I'm new to the elastic ecosystem.
I'm trying to use logstash to take a json file as input and then use a filter to split into different events but I'm drowned by all of the info there is online.

This is a short version of my code

{
     "proximity": [
     {
     "network": "all_networks",
     "date": "2020-05-12",
     "connected": 335,
     "visitors": 584,
     "passerby": 7201,
     "hour": 0
     },
     {
     "network": "all_networks",
     "date": "2020-05-12",
     "connected": 330,
     "visitors": 388,
     "passerby": 5829,
     "hour": 1
     }],
     "visitLength": [
     {
     "network": "all_networks",
     "date": "2020-05-12",
     "5-20m": 234,
     "20-60m": 168,
     "1-6h": 99,
     "6+h": 83,
     "hour": 0
     },
     {
     "network": "all_networks",
     "date": "2020-05-12",
     "5-20m": 134,
     "20-60m": 139,
     "1-6h": 115,
     "6+h": 0,
     "hour": 1
     }],
     "loyaltyRecords": [
     {
     "network": "all_networks",
     "date": "2020-05-12",
     "first": 0,
     "daily": 515,
     "weekly": 69,
     "occasional": 0,
     "hour": 0
     },
     {
     "network": "all_networks",
     "date": "2020-05-12",
     "first": 0,
     "daily": 315,
     "weekly": 73,
     "occasional": 0,
     "hour": 1
     }]
    }

Its actually 16k lines but I'm working with this to make faster tests.

I know I have to use an input which I have like this:

        input {
        file {
        mode => "read"
        path => "/home/user/Desktop/data.json"
        start_position => "beginning"
        codec => json
        }
    }

Filter is

     filter {
     json { source => "message"}
     split { field => "proximity" }
    }

and my output

    output {
      stdout { }
      file { path => "/home/user/Desktop/myfile.txt" codec => json }
    }

To make things easier I need each of these to be one entry/doc so then I can output to elastic search (for now I'm using the file to check)

     {
         "network": "all_networks",
         "date": "2020-05-12",
         "first": 0,
         "daily": 315,
         "weekly": 73,
         "occasional": 0,
         "hour": 1
         }

As you may have noticed, I have 3 arrays and each contains the "documents" I want to index.
They could be all in the same index or different indices.

Could anyone help me to understand the order of how to do things? i understand there is also a "multiline" code which i don't know if i need or not.

Also im getting a json parse error so im not sure if my data is coded correctly or maybe i need to use the multiline codec for my input.

Thanks in advance

If you have a working json codec then typically you do not need a json filter as well. However, unless you have complete JSON objects on each line a json codec will not work, you need a multiline codec.

If you want to consume the entire file as a single event then configure the multiline codec as described here. Then use a json filter. Then you can split the arrays using filters like

split { field => "proximity" }

However, that will end up with events that have a proximity field, a visitLength field and a loyaltyRecords field. If that is not what you want then you will need to use ruby.

Thanks for replying so fast.

So I already have used the codec and have the whole json inside the "message" field and called it using the json filter.

Unfortunately the split filter is not doing what i need (as you mentioned previously).

Is there a good guide/link that explains how to get "events" out of each array?

Thanks

What problem do you have with the results of the split filter?

I have applied the following filters

json { source => "message" remove_field => [ "message" ] }
 mutate { remove_field => [ "host", "@version", "@timestamp", "path" ] }

And my output looks like this:

{
  "visitLength": [ (here I have the 2 elements) ],
  "loyaltyRecords": [ (another 2)  ],
  "proximity": [ (The last 2)  ]
}

now what I need, is to get each element of all of the arrays out so later I can identify each of them as an event.

If I use the split I get the elements but inside their original field For example

split { field => "proximity" }

this gets me this

Screenshot from 2020-05-15 11-27-35

In your previous comment, you already established that behaviour.

UPDATE

I have found that in ruby I could create a new field with other fields data,

ruby { code => 'event.set("my_field", [ event.get("proximity"), event.get("loyaltyRecords"), event.get("visitLength") ]) '}

Problem is that I keep getting them nested, is there a way to get the all the elements? unless i need to do a "for"???

If you want to move nested fields to the top level you can do something like this.

I replaced the "doc" for one of my fields to get the objects out of the field into top level

That seems to be the solution from what I can read of the code but I'm getting an exception

Ruby exception occurred: no implicit conversion of Hash into String

That suggests that you are trying to do it to an array of hashes. You would need to split them first.

What does an event look like if you send it to this?...

output { stdout { codec => rubydebug } }

Without the split

  2020-05-15T13:38:34,375][ERROR][logstash.filters.ruby    ][main] Ruby exception occurred: no implicit conversion of Hash into String
    .7.0/lib/awesome_print/formatters/base_formatter.rb:31: warning: constant ::Fixnum is deprecated
    {
           "visitLength" => [
            [0] {
                   "hour" => 0,
                   "1-6h" => 99,
                "network" => "all_networks",
                 "20-60m" => 168,
                   "date" => "2020-05-12",
                  "5-20m" => 234,
                    "6+h" => 83
            },
            [1] {
                   "hour" => 1,
                   "1-6h" => 115,
                "network" => "all_networks",
                 "20-60m" => 139,
                   "date" => "2020-05-12",
                  "5-20m" => 134,
                    "6+h" => 0
            }
        ],
        "loyaltyRecords" => [
            [0] {
                      "hour" => 0,
                "occasional" => 0,
                   "network" => "all_networks",
                     "daily" => 515,
                      "date" => "2020-05-12",
                     "first" => 0,
                    "weekly" => 69
            },
            [1] {
                      "hour" => 1,
                "occasional" => 0,
                   "network" => "all_networks",
                     "daily" => 315,
                      "date" => "2020-05-12",
                     "first" => 0,
                    "weekly" => 73
            }
        ],
                  "tags" => [
            [0] "_rubyexception"
        ],
             "proximity" => [
            [0] {
                 "passerby" => 7201,
                     "hour" => 0,
                  "network" => "all_networks",
                     "date" => "2020-05-12",
                "connected" => 335,
                 "visitors" => 584
            },
            [1] {
                 "passerby" => 5829,
                     "hour" => 1,
                  "network" => "all_networks",
                     "date" => "2020-05-12",
                "connected" => 330,
                 "visitors" => 388
            }
        ]
    }

My filters are:

json { source => "message" remove_field => [ "message" ] }
 mutate { remove_field => [ "host", "@version", "@timestamp", "path" ] }



 ruby {
      code => '
          event.get("proximity").each { |k, v|
              event.set(k,v)
          }
          event.remove("proximity")
      '
  }
 }

For the event you posted, if you do

    split { field => "loyaltyRecords" }
    split { field => "proximity" }
    split { field => "visitLength" }
    ruby {
        code => '
            [ "loyaltyRecords", "proximity", "visitLength" ].each { |field|
                event.get(field).each { |k, v|
                    event.set(k, v)
                }
                event.remove(field)
            }
        '
    }

then you will get events like

{
       "6+h" => 0,
   "network" => "all_networks",
    "weekly" => 73,
  "visitors" => 388,
    "20-60m" => 139,
"@timestamp" => 2020-05-15T21:51:47.643Z,
      "date" => "2020-05-12",
     "daily" => 315,
      "1-6h" => 115,
 "connected" => 330,
  "passerby" => 5829,
      "hour" => 1,
"occasional" => 0,
     "5-20m" => 134,
  "@version" => "1",
     "first" => 0
}

Does that work for you?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.