Proper way to extract df -h output

Hello everyone,

i got this data through snmp input plugin. After some mutate filter, I need to extract this data, and I chose to use grok filter. The data looks like this (there is blank space in the first line)


Filesystem              1M-blocks  Used Available Use% Mounted on
devtmpfs                     8002     0      8002   0% /dev
tmpfs                        8014     3      8012   1% /dev/shm
tmpfs                        8014   793      7222  10% /run
tmpfs                        8014     0      8014   0% /sys/fs/cgroup
/dev/mapper/vg.00-root       3904  1338      2346  37% /
/dev/sda1                     973    44       863   5% /boot
/dev/mapper/vg.00-conf       1952     6      1828   1% /conf
/dev/mapper/vg.00-tmp       20031    50     18942   1% /tmp
/dev/mapper/vg.00-large    202614  7643    184657   4% /large
tmpfs                        1603     0      1603   0% /run/user/1002
tmpfs                        1603     0      1603   0% /run/user/1000
tmpfs                           4     3         2  58% /mnt/clink1

And what I have done is I created this pattern. My goal is to extract the /large mountpoint only, but I think this pattern is not proper for production. You can see there are so many %{GREEDYDATA} patterns there. so I'm asking if there is a proper way to extract this data? Thanks

%{GREEDYDATA}\n%{GREEDYDATA}\n%{GREEDYDATA}\n%{GREEDYDATA}\n%{GREEDYDATA}\n%{GREEDYDATA}\n%{GREEDYDATA}\n%{GREEDYDATA}\n%{GREEDYDATA}\n%{GREEDYDATA}\n%{PATH:filesystem}\s+%{INT:1M_blocks:int}\s+%{INT:used:int}\s+%{INT:available:int}\s+%{INT:used_pct:int}\%\s+%{PATH:mountpoint}

this is the result in grok debugger kibana

{
  "used_pct": 4,
  "available": 184657,
  "used": 7643,
  "1M_blocks": 202614,
  "filesystem": "/dev/mapper/vg.00-large",
  "mountpoint": "/large"
}

Logstash version: 8.11.2

Every line is a separated event, right? What are the mutates you are using before?

Also, the grok seems to be wrong, you have multiple line breaks?

I would not use grok for this, you can combine some mutate filter and dissect to parse that message.

This works for the output of df -h.

filter {
    mutate {
        gsub => ["message","\s+"," "]
    }
    dissect {
        mapping => {
            "message" => "%{filesystem} %{1M_blocks} %{used} %{available} %{used_pct} %{mountpoint}"
        }
    }
    if [filesystem] == "Filesystem" {
        drop {}
    }     
}

The output would be something like this:

{
       "message" => "devtmpfs 8002 0 8002 0% /dev",
    "@timestamp" => 2025-05-14T14:38:43.615699871Z,
     "available" => "8002",
    "mountpoint" => "/dev",
          "used" => "0",
    "filesystem" => "devtmpfs",
      "used_pct" => "0%",
     "1M_blocks" => "8002"
}
{
       "message" => "/dev/mapper/vg.00-large 202614 7643 184657 4% /large",
    "@timestamp" => 2025-05-14T14:38:43.616385831Z,
     "available" => "184657",
    "mountpoint" => "/large",
          "used" => "7643",
    "filesystem" => "/dev/mapper/vg.00-large",
      "used_pct" => "4%",
     "1M_blocks" => "202614"
}

Do you want all the lines or just the one for the /large mountpoint?

If you just want the /large line you can add this in the filter block afte the dissect filter.

    if [mountpoint] != "/large" {
        drop {}
    } 

If it is all one long string you do not need your grok pattern to match things that you are not interested in. You can anchor the pattern with the name of the filesystem you care about.

grok { match => { "message" => "%{PATH:filesystem}\s+%{INT:1M_blocks:int}\s+%{INT:used:int}\s+%{INT:available:int}\s+%{INT:used_pct:int}\%\s+/large" } }

it's one event

I used it for renaming fields because the name of the field was unreadable from snmp input plugin

Maybe you forgot to give 1 blank line in the first line. i tested in my kibana and it works

Your pattern does not seem to work if I test it in Grok Debugger Kibana. Should I test it in logstash directly?

oops, my bad. i think there is a double quote that i forgot to delete. Your pattern works well. thank you

I modified your pattern a little bit because I still need /large as a value. so the pattern will look like this

%{PATH:filesystem}\s+%{INT:1M_blocks:int}\s+%{INT:used:int}\s+%{INT:available:int}\s+%{INT:used_pct:int}\%\s+(?<mountpoint>/large)

just wanna ask something. It's out of this topic. Is Ruby code expensive in logstash?

Much of logstash is written in Ruby (and executed in JRuby). A few years ago some of the core code was rewritten in Java, presumably for performance reasons, but most plugins are still Ruby. So Ruby is not, in and of itself, expensive compared to the rest of logstash.

2 Likes