Logstash Split Plugin


(Shrawan Bhagwat) #1

Hi All,

We do have JSON data in the form
{
"data": [

	{
		"appName": "DemoApp",
		"appVersion": "1.1",
		"deviceId": "1234567",
		"deviceName": "moto e",
		"deviceOSVersion": "5.1",
		"packageName": "com.abc.DemoApp",
		"message": "testing null pointer exception",
		"errorLog": "null pointer exception"
	},

	{
		"appName": "DemoApp",
		"appVersion": "1.1",
		"deviceId": "1234567",
		"deviceName": "moto e",
		"deviceOSVersion": "5.1",
		"packageName": "com.abc.DemoApp",
		"message": "testing illegal state exception",
		"errorLog": "illegal state exception"
	}
]

}

We want to split data into separate messages and different fields like appName, appVersion etc.
My queries:
i) We use Split filter to split it into different fields and we are getting "data_appName" as field name instead of "appName"! how to change this field to "appName" etc without data_ prefix
ii) Message field contains both data not single data. how to split message field?
iii) We are getting %host in source field. I have tried to rename this using mutate plugin but it doesnt work. How can i rename field value in this??

filter in config file:

filter{
json {
source => "message"
}

   mutate { gsub => [ "message", "},", "shr" ] }
    split {
         terminator => "shr"
          field => "data"
   }

Please guide for both queries.

Regards,
Shrawan


(Magnus Bäck) #2

We use Split filter to split it into different fields and we are getting "data_appName" as field name instead of "appName"! how to change this field to "appName" etc without data_ prefix

I can't reproduce.

$ cat test.config 
input { stdin { codec => json } }
output { stdout { codec => rubydebug } }
filter {
  split {
    field => "data"
  }
}
$ cat data 
{ "data": [ { "appName": "DemoApp", "appVersion": "1.1", "deviceId": "1234567", "deviceName": "moto e", "deviceOSVersion": "5.1", "packageName": "com.abc.DemoApp", "message": "testing null pointer exception", "errorLog": "null pointer exception" }, { "appName": "DemoApp", "appVersion": "1.1", "deviceId": "1234567", "deviceName": "moto e", "deviceOSVersion": "5.1", "packageName": "com.abc.DemoApp", "message": "testing illegal state exception", "errorLog": "illegal state exception" } ] }
$ cat data | /opt/logstash/bin/logstash -f test.config
Settings: Default pipeline workers: 8
Pipeline main started
{
          "data" => {
                "appName" => "DemoApp",
             "appVersion" => "1.1",
               "deviceId" => "1234567",
             "deviceName" => "moto e",
        "deviceOSVersion" => "5.1",
            "packageName" => "com.abc.DemoApp",
                "message" => "testing null pointer exception",
               "errorLog" => "null pointer exception"
    },
      "@version" => "1",
    "@timestamp" => "2017-02-08T06:42:43.941Z",
          "host" => "lnxolofon"
}
{
          "data" => {
                "appName" => "DemoApp",
             "appVersion" => "1.1",
               "deviceId" => "1234567",
             "deviceName" => "moto e",
        "deviceOSVersion" => "5.1",
            "packageName" => "com.abc.DemoApp",
                "message" => "testing illegal state exception",
               "errorLog" => "illegal state exception"
    },
      "@version" => "1",
    "@timestamp" => "2017-02-08T06:42:43.941Z",
          "host" => "lnxolofon"
}
Pipeline main has been shutdown
stopping pipeline {:id=>"main"}

Message field contains both data not single data. how to split message field?

I don't understand.

We are getting %host in source field. I have tried to rename this using mutate plugin but it doesnt work. How can i rename field value in this??

Who or what populates the source field? What inputs do you have?

Instead of describing what you get, show it. Use a stdout { codec => rubydebug } output. Even better if you provide a reproducibility recipe like I have above where the input, the configuration, and the results are clearly visible.


(Shrawan Bhagwat) #3

Hi Magnus, :slight_smile:

I have attached screen shot of output that we are getting on Graylog UI.
In this output as you can see below points:

  1. when we are splitting "field" on basis of "data", the field name are getting prefixed by "data_". What we can do to avoid this prefix or to replace this prefix with original field name?
  2. When we are splitting it on basis of field, "full_message" or "message" field contains complete message instead of individual messages like
    output1 -
    {
    "appName": "DemoApp",
    "appVersion": "1.1",
    "deviceId": "1234567",
    "deviceName": "moto e",
    "deviceOSVersion": "5.1",
    "packageName": "com.abc.DemoApp",
    "message": "testing null pointer exception",
    "errorLog": "null pointer exception"
    },

output 2:
{
"appName": "DemoApp",
"appVersion": "1.1",
"deviceId": "1234567",
"deviceName": "moto e",
"deviceOSVersion": "5.1",
"packageName": "com.abc.DemoApp",
"message": "testing null pointer exception",
"errorLog": "null pointer exception"
},

  1. we know that source field contains value of source from which we are accessing this. but in our case these are service calls and therefore the source party is not having any name or so. So is there any way to rename field value of source field and facility field?

Regards,
shrawan


(Magnus Bäck) #4

when we are splitting "field" on basis of "data", the field name are getting prefixed by "data_". What we can do to avoid this prefix or to replace this prefix with original field name?

Okay. The data_ prefix is probably a Graylog thing. If you move the subfields to the data field into the root of the message that'll probably fix itself.

I don't think you can change the current behavior of split where it places the fields as subfields of data, but you can use a mutate filter's rename option to move the subfields to the root of the message. If you won't know exactly what fields you'll have you need to use a ruby filter for moving the fields.

When we are splitting it on basis of field, "full_message" or "message" field contains complete message instead of individual messages like

This'll also fix itself when you address the previous point.

we know that source field contains value of source from which we are accessing this. but in our case these are service calls and therefore the source party is not having any name or so. So is there any way to rename field value of source field and facility field?

So I ask again: What inputs do you have?


(Shrawan Bhagwat) #5

Hi Magnus,

for point#1: we have tried using following ruby filter to rename all dynamic fields that start with data_ ,but it didn't work.
ruby {
code => "
event.to_hash.keys.each { |k|
if k.start_with?('data_')
k.tr('data_',' ')
end
}
"
}

Can you please guide us with this?

for point #3,
We do have input of the following form:
input {

    http {
    codec => "plain"
    }

}

Regards,
Shrawan


(Magnus Bäck) #6

Your Ruby snippet has at least two problems:

  • It tries to modify the key name in place which won't work.
  • You're ignoring what I said about the data_ prefix being specific to Graylog. As the example I showed previously the split filter doesn't produce any field prefixed with data_.

(You're potentially also misunderstanding with tr does, but in this case it happens to do what you want anyway.)

Have a look at http://stackoverflow.com/a/28368575/414355. If you run Logstash 2.4 or later you need to switch to the new event API so the answer won't be usable out of the box.

Regarding the source field I don't know what's going on. AFAICT the http input doesn't add such a field.


(Shrawan Bhagwat) #7

Thanks Magnus for your help :slight_smile: I will let you once i will try that.


(Shrawan Bhagwat) #8

Hi Magnus,

Can you please help me with add_field feature of mutate plugin:

I am trying to add new field name appName and i want it to contain dynamic value of appName. i have tried below mentioned code:
mutate {
add_field => {"AppName"=> "%{appName}"}
}

but it's printing %{appName} in field value instead of dynamic value.

Please guide.


(Magnus Bäck) #9

That indicates that the event didn't contain an appName field.


(Shrawan Bhagwat) #10

How can i add new field on basis of appName i m getting in DATA?
can you please help. :slight_smile:


(Magnus Bäck) #11

What do your events currently look like? What do you want them to look like instead? What does your Logstash configuration look like? Please provide as much information as possible. I don't have time to play 20 questions every time people here need help.


(Shrawan Bhagwat) #12

here is my config file:
input {

    http {
    codec => "plain"
    }

}

filter{

   mutate { gsub => [ "message", "},", "shr" ] }
    split {
         terminator => "shr"
          }

mutate {
add_field => {"AppName"=> "%{appName}"}
}
}

here is the data that we are receiving through http plugin:
{
"data": [

	{
		"appName": "DemoApp",
		"appVersion": "1.1",
		"deviceId": "1234567",
		"deviceName": "moto e",
		"deviceOSVersion": "5.1",
		"packageName": "com.abc.DemoApp",
		"message": "testing null pointer exception",
		"errorLog": "null pointer exception"
	},

	{
		"appName": "DemoApp",
		"appVersion": "1.1",
		"deviceId": "1234567",
		"deviceName": "moto e",
		"deviceOSVersion": "5.1",
		"packageName": "com.abc.DemoApp",
		"message": "testing illegal state exception",
		"errorLog": "illegal state exception"
	}
]

}

we want to separate all data like appName, deviceID etc into separate field.


(Magnus Bäck) #13

I've already answered your question with a reference to a ruby filter that you can use, but now you're back to asking the same question again. Sorry, I can't help here anymore.


(Shrawan Bhagwat) #14

thanks magnus :slight_smile:


(system) #15

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.