Parsing Collectd logstash file and pushing metrics to graphite

Hi All,

Can you please help me with the query.

My requirement is :

  1. i have to collect the metrics using Collectd .Metrics like (cpu,memory and process)
  2. After getting this metrics in collectd.json.log (collected generated this logstash file)
    3.I have parsed this file as input in logstash config file and i have applied grok filters

logstash conf file:
input {
file {
path => "/var/log/collectd.json.log"
codec => "json"
sincedb_path => "/dev/null"
start_position => "beginning"
}
}
filter {
grok {
match => { "message" => "%{WORD:message}%{NOTSPACE:plugin_load }%{DATA:method}%{WORD:plugin}%{DATA:method}%{WORD:stats}%{GREEDYDATA:info}%{LOGLEVEL:info}%{NOTSPACE:}%{TIMESTAMP_ISO8601:time}" }
add_tag => [ "Collectd_Stats", "Collectd_logs" ]
}
}
output {
graphite {
host => "localhost"
port => "2003"
metrics => { "Metrics.XYZ.TEST.%{cpu}" => "%{stats}" }
metrics => { "Metrics.QLX.TEST.memory" => "%{stats}" }

}

stdout {
codec => "rubydebug"
}
}
4. In graphite, i can see the directory structure created but i don't see any values or metrics.
5.
Grok Filter applied on the below:
{"message":"plugin_load: plugin "apache" successfully loaded.","level":"info","@timestamp":"2020-03-06T10:51:37Z"}
{"message":"plugin_load: plugin "cpu" successfully loaded.","level":"info","@timestamp":"2020-03-06T10:51:37Z"}
{"message":"plugin_load: plugin "df" successfully loaded.","level":"info","@timestamp":"2020-03-06T10:51:37Z"}
{"message":"plugin_load: plugin "disk" successfully loaded.","level":"info","@timestamp":"2020-03-06T10:51:37Z"}
{"message":"plugin_load: plugin "memory" successfully loaded.","level":"info","@timestamp":"2020-03-06T10:51:37Z"}
Kindly help me to solve this issue.

Hi there,

first of all, please tide up your post. It means before posting something, paste it in any editor (VSCode, Sublime, Atom or whatever), space it and indent it, then paste it here, highlight it and use the Preformatted text tool (image ) to properly format it. Otherwise it'll be unreadable.

Secondly, remember you're in the Logstash section. So the first thing here is "what do you have in input? What would you like to have in output?".

It means, post here what you see in the standard output of a pipeline like the following:

input {
  file {
    path => "/var/log/collectd.json.log"
    codec => "json"
    sincedb_path => "/dev/null"
    start_position => "beginning"
  }
}

filter {}

output {
  stdout{}
}

and then tell us what you would like the output to look like.

Finally, I don't really quite understand your problem, partly because I don't know graphite, partly because you didn't really specified what you see in output of your logstash pipeline applying your filter section (if you see anything at all).

To make it simple:
below is my log file:
I want to grok filter to get the CPU,DF,DISK metrics as output(stdout)
<
{"message":"plugin_load: plugin "cpu" successfully loaded.","level":"info","@timestamp":"2020-03-16T11:15:17Z"}
{"message":"plugin_load: plugin "df" successfully loaded.","level":"info","@timestamp":"2020-03-16T11:15:17Z"}
{"message":"plugin_load: plugin "disk" successfully loaded.","level":"info","@timestamp":"2020-03-16T11:15:17Z"}
{"message":"plugin_load: plugin "load" successfully loaded.","level":"info","@timestamp":"2020-03-16T11:15:17Z"}
/>

There is a reason why I asked you for the output of that pipeline. It's because what logstash sees is not always what you see.

Furthermore, according to the lines you posted, there are no metrics at all. Which is the cpu metric in this line?

{"message":"plugin_load: plugin "cpu" successfully loaded.","level":"info","@timestamp":"2020-03-16T11:15:17Z"}

Successfully loaded is not a metric. It's a status.

And please, I've already asked you. Format your code.

You can see the preview on the right.

Hi Fabio,

I have tried formating now, hopefully this is format you are looking for
Below is the data which i need to apply the grok filter

{"plugin":"df","host":"172.11.1.7","value":5.3622013952E10,"plugin_instance":"mnt-cordaent","collectd_type":"df_complex","@timestamp":"2020-03-18T13:01:27.419Z","type_instance":"free","@version":"1"}

Hello Fabio,

Can you please help with this ..thanks.

Hi Fabio,

Can you please help with my query.
I have tried the logstash configuration file as below:

This is my configuration file:

input {
        udp {
         port => 25826
         buffer_size => 1452
         codec => collectd { }
       }
}
filter {
    json {
        source => "message"
    }
      prune {
        whitelist_names => [ "^plugin$","^value$","^host$"]

      }
 grok {
   match => { "message" => "%{WORD:plugin}%{DATA:method}%{WORD:cpu}%{NOTSPACE}%{IPV4:host}%{NOTSPACE}%{WORD:stats}" }
  }
}
output {
 graphite{
  host => "00.00.000"
  port => "2003"
  metrics => {"Test.Project.%{plugin}" => "%{stats}"}
}
file{
path => "/var/log/logstash_f.log"
}
}

I have applied grok filter to get the metrics of cpu,memory but i am not receiving them in my graphite.

Graphite Screen shot: i getting the metrics but values are not visible.

It is funny because you're using the code formatter on simple text messages like this

yet you post a json on a single line like here

which doesn't make it absolutely more readable at all. What about something like this?

{
   "plugin":"df",
   "host":"172.11.1.7",
   "value":5.3622013952E10,
   "plugin_instance":"mnt-cordaent",
   "collectd_type":"df_complex",
   "@timestamp":"2020-03-18T13:01:27.419Z",
   "type_instance":"free",
   "@version":"1"
}

Better, isn't it? Just like what you did in your last post with the pipeline configuration (even though the indentation there is completely messed up).

Anyway

Below is the data which i need to apply the grok filter

{"plugin":"df","host":"172.11.1.7","value":5.3622013952E10,"plugin_instance":"mnt-cordaent","collectd_type":"df_complex","@timestamp":"2020-03-18T13:01:27.419Z","type_instance":"free","@version":"1"}

So I suppose the json a posted above is the result of your pipeline with no filter applied, is that right?

If the json above is the input, the pipeline in the last post makes no sense. I mean, you parse the input with the json filter (which is right if the input is that famous json), then you keep only plugin, value and host fields, then why would you like to an additional grok filter, on a field (message) that you didn't put in your whitelist even?

Let's put it simple. Post the output of this pipeline

input {
  udp {
    port => 25826
    buffer_size => 1452
    codec => collectd {}
  }
}

filter {}

output {
  stdout{}
}

and tell me which fields you would like to keep.

Got below output from the given code by you.

{
           "@version" => "1",
      "collectd_type" => "cpu",
             "plugin" => "cpu",
         "@timestamp" => 2020-03-20T08:19:17.415Z,
               "host" => "1.000.001",
      "type_instance" => "steal",
              "value" => 0,
    "plugin_instance" => "3"
}
{
           "@version" => "1",
      "collectd_type" => "df_complex",
             "plugin" => "df",
         "@timestamp" => 2020-03-20T08:19:08.918Z,
               "host" => "1.000.001",
			   
      "type_instance" => "free",
              "value" => 31900123136.0,
    "plugin_instance" => "mnt"
}

I need to get only metrics like cpu,memory,df and their metric values as output along with timestamp and these metrics should get in Graphite dashboard.

Ok so, would you like to only output those documents that have the field plugin (or another field?) equal to one of the following ["cpu", "df", "memory"] ? And for each of those, is value the only field you need (since it is the only "metric" field I see in what you posted) or you need all the other fields like ["collectd_type", "host", "type_instance", "plugin_instance"] ?

Field required is plugin and it's value with given timestamp

Ok, then, IF the output of this no-filter pipeline I already posted:

input {
  udp {
    port => 25826
    buffer_size => 1452
    codec => collectd {}
  }
}

filter {}

output {
 stdout{}
}

is what you said it is (means this):

{
           "@version" => "1",
      "collectd_type" => "cpu",
             "plugin" => "cpu",
         "@timestamp" => 2020-03-20T08:19:17.415Z,
               "host" => "1.000.001",
      "type_instance" => "steal",
              "value" => 0,
    "plugin_instance" => "3"
}
{
           "@version" => "1",
      "collectd_type" => "df_complex",
             "plugin" => "df",
         "@timestamp" => 2020-03-20T08:19:08.918Z,
               "host" => "1.000.001",
      "type_instance" => "free",
              "value" => 31900123136.0,
    "plugin_instance" => "mnt"
}

Then you don't even need to apply the json filter (as you can see, without any filter it is already recognized as a json), so you should be good to go with something like:

input {
  udp {
    port => 25826
    buffer_size => 1452
    codec => collectd {}
  }
}

filter {
  if "cpu" in [plugin] or "df" in [plugin] or "memory" in [plugin] {
    prune {
      whitelist_names => [ "^plugin$","^value$","^@timestamp"]
    }
  } else {
    drop{}
  }
}

output {
  graphite {
    host => "00.00.000"
    port => "2003"
    metrics => {"Test.Project.%{plugin}" => "%{stats}"}
  }
}

Only problem here is you whitelisted fields ["plugin", "value", "@timestamp"], which is what you asked for in your last post. However, you're trying to use the stats field in your Graphite output, which you obviously do not have (apart from the fact you didn't whitelisted it, but it is not even present in the jsons you provided above and I pasted in this post). I don't know what you expect the stats field to be like :man_shrugging:

thanks Fabio.
stats field is nothing but the value field.
Now i have modified it as below and the values flowing to graphite dashboard.
metrics => {"Test.Project.%{plugin}" => "%{value}"}

I'm glad the problem is solved :slight_smile:

If you found a solution in my previous posts please mark one as solution so future readers will know this thread has been solved.