Proper way to extract df -h output

yuswanul · May 14, 2025, 2:28pm

Hello everyone,

i got this data through snmp input plugin. After some mutate filter, I need to extract this data, and I chose to use grok filter. The data looks like this (there is blank space in the first line)


Filesystem              1M-blocks  Used Available Use% Mounted on
devtmpfs                     8002     0      8002   0% /dev
tmpfs                        8014     3      8012   1% /dev/shm
tmpfs                        8014   793      7222  10% /run
tmpfs                        8014     0      8014   0% /sys/fs/cgroup
/dev/mapper/vg.00-root       3904  1338      2346  37% /
/dev/sda1                     973    44       863   5% /boot
/dev/mapper/vg.00-conf       1952     6      1828   1% /conf
/dev/mapper/vg.00-tmp       20031    50     18942   1% /tmp
/dev/mapper/vg.00-large    202614  7643    184657   4% /large
tmpfs                        1603     0      1603   0% /run/user/1002
tmpfs                        1603     0      1603   0% /run/user/1000
tmpfs                           4     3         2  58% /mnt/clink1

And what I have done is I created this pattern. My goal is to extract the /large mountpoint only, but I think this pattern is not proper for production. You can see there are so many %{GREEDYDATA} patterns there. so I'm asking if there is a proper way to extract this data? Thanks

%{GREEDYDATA}\n%{GREEDYDATA}\n%{GREEDYDATA}\n%{GREEDYDATA}\n%{GREEDYDATA}\n%{GREEDYDATA}\n%{GREEDYDATA}\n%{GREEDYDATA}\n%{GREEDYDATA}\n%{GREEDYDATA}\n%{PATH:filesystem}\s+%{INT:1M_blocks:int}\s+%{INT:used:int}\s+%{INT:available:int}\s+%{INT:used_pct:int}\%\s+%{PATH:mountpoint}

this is the result in grok debugger kibana

{
  "used_pct": 4,
  "available": 184657,
  "used": 7643,
  "1M_blocks": 202614,
  "filesystem": "/dev/mapper/vg.00-large",
  "mountpoint": "/large"
}

Logstash version: 8.11.2

leandrojmp · May 14, 2025, 2:43pm

Every line is a separated event, right? What are the mutates you are using before?

Also, the grok seems to be wrong, you have multiple line breaks?

I would not use grok for this, you can combine some mutate filter and dissect to parse that message.

This works for the output of df -h.

filter {
    mutate {
        gsub => ["message","\s+"," "]
    }
    dissect {
        mapping => {
            "message" => "%{filesystem} %{1M_blocks} %{used} %{available} %{used_pct} %{mountpoint}"
        }
    }
    if [filesystem] == "Filesystem" {
        drop {}
    }     
}

The output would be something like this:

{
       "message" => "devtmpfs 8002 0 8002 0% /dev",
    "@timestamp" => 2025-05-14T14:38:43.615699871Z,
     "available" => "8002",
    "mountpoint" => "/dev",
          "used" => "0",
    "filesystem" => "devtmpfs",
      "used_pct" => "0%",
     "1M_blocks" => "8002"
}
{
       "message" => "/dev/mapper/vg.00-large 202614 7643 184657 4% /large",
    "@timestamp" => 2025-05-14T14:38:43.616385831Z,
     "available" => "184657",
    "mountpoint" => "/large",
          "used" => "7643",
    "filesystem" => "/dev/mapper/vg.00-large",
      "used_pct" => "4%",
     "1M_blocks" => "202614"
}

Do you want all the lines or just the one for the /large mountpoint?

If you just want the /large line you can add this in the filter block afte the dissect filter.

    if [mountpoint] != "/large" {
        drop {}
    }

Badger · May 14, 2025, 2:59pm

If it is all one long string you do not need your grok pattern to match things that you are not interested in. You can anchor the pattern with the name of the filesystem you care about.

grok { match => { "message" => "%{PATH:filesystem}\s+%{INT:1M_blocks:int}\s+%{INT:used:int}\s+%{INT:available:int}\s+%{INT:used_pct:int}\%\s+/large" } }

yuswanul · May 15, 2025, 3:14am

it's one event

I used it for renaming fields because the name of the field was unreadable from snmp input plugin

Maybe you forgot to give 1 blank line in the first line. i tested in my kibana and it works

yuswanul · May 15, 2025, 3:17am

Your pattern does not seem to work if I test it in Grok Debugger Kibana. Should I test it in logstash directly?

yuswanul · May 15, 2025, 3:27am

oops, my bad. i think there is a double quote that i forgot to delete. Your pattern works well. thank you

I modified your pattern a little bit because I still need /large as a value. so the pattern will look like this

%{PATH:filesystem}\s+%{INT:1M_blocks:int}\s+%{INT:used:int}\s+%{INT:available:int}\s+%{INT:used_pct:int}\%\s+(?<mountpoint>/large)

yuswanul · May 15, 2025, 5:49am

just wanna ask something. It's out of this topic. Is Ruby code expensive in logstash?

Badger · May 15, 2025, 2:17pm

Much of logstash is written in Ruby (and executed in JRuby). A few years ago some of the core code was rewritten in Java, presumably for performance reasons, but most plugins are still Ruby. So Ruby is not, in and of itself, expensive compared to the rest of logstash.

Topic		Replies	Views
Grok filter logstash config Logstash	7	1435	July 6, 2017
Parsing with Grok filter Logstash	2	418	May 22, 2017
Take out bits of a URIPATH in Logstash Logstash	21	4981	July 6, 2017
Django log format\| grok things \| need help Logstash	5	771	July 6, 2017
Extract a string from logstash path Kibana	6	4829	January 18, 2018

Proper way to extract df -h output

Related topics