How do I create a global date based on file header

TaylorSanchez · September 19, 2016, 5:36pm

I am parsing some sysstat (Linux sar commands) files. They sysstat files start with a header of the Linux kernel, hostname, date, and architecture. I was able to parse out what I wanted with this grok filter:

patterns_dir => ["./logstash_patterns"]
match => {
  "message" => "%{STAT_KERNEL:stat_kernel}\(%{HOSTNAME:stat_hostname}\) \t%{DATE_US:stat_date}%{GREEDYDATA:remaining_stat_message} "
}

After that I I can collect the time of entry and the stats reported with this grok filter:

match => {
  "message" => "%{TIME:stat_time}\s+%{NUMBER:stat_b_tps}\s+%{NUMBER:stat_b_rtps}\s+%{NUMBER:stat_b_wtps}\s+%{NUMBER:stat_b_bread}\s+%{NUMBER:stat_b_bwrtn}%{GREEDYDATA:remaining_stat3_message}"
}

What I can't figure out how to do is build a timestamp for each entry since the date and time are on separate lines and the date is only listed once. Can I store the data in some sort of variable to reference later?

Here is a snippet of the file I am parsing:

####### sa05-b.out ########
Linux 2.6.32.54-0.79.TDC.1.R.0-default (WAITROSE-1-9) 	09/04/16 	_x86_64_

16:00:02          tps      rtps      wtps   bread/s   bwrtn/s
16:05:01        48.31      9.20     39.11    258.27   1229.09
16:10:01        48.21      9.35     38.86     97.71   1012.97
08:40:01        40.93      9.66     31.27    278.61    988.54
08:45:01        45.21      9.54     35.67    185.97   1530.56
08:50:01        41.37      9.36     32.01    124.09    983.74
08:55:01        47.40      9.27     38.13    123.23   1058.12
09:00:02        40.87      9.47     31.40    216.35    897.70
09:05:01        48.37      9.85     38.52    275.62   1205.39
09:10:01        47.12      9.33     37.79    114.50    967.01
09:15:01        47.33      9.88     37.45    334.40   1277.19
09:20:01        42.01     10.09     31.92    278.22   1158.59
Average:        57.66     18.81     38.85   2348.31   2472.59

Thanks!

TaylorSanchez · September 19, 2016, 5:40pm

This is a snippet of how my current rubydebug output looks for the data I am interested in:

{
         "@version" => "1",
       "@timestamp" => "2016-09-19T17:38:00.941Z",
             "tags" => [
        [0] "sysstat",
        [1] "b"
    ],
             "host" => "WAITROSE-1-9",
      "stat_kernel" => "Linux 2.6.32.54-0.79.TDC.1.R.0-default ",
    "stat_hostname" => "WAITROSE-1-9",
        "stat_date" => "09/06/16"
}
{
        "@version" => "1",
      "@timestamp" => "2016-09-19T17:38:01.718Z",
            "tags" => [
        [0] "sysstat",
        [1] "b"
    ],
            "host" => "%{stat_hostname}",
       "stat_time" => "00:30:01",
      "stat_b_tps" => "75.74",
     "stat_b_rtps" => "51.89",
     "stat_b_wtps" => "23.84",
    "stat_b_bread" => "2592.35",
    "stat_b_bwrtn" => "3440.23"
}

TaylorSanchez · September 19, 2016, 11:39pm

For now I just used python to set environment variables which get from reading the file with python, then the logstash reads the environmental variable.

Mark_Dolan · September 20, 2016, 8:17am

If you don't mind could you provide an example? I have a similar issue that could be solved using this method.
Thanks.

TaylorSanchez · September 20, 2016, 2:58pm

So you don't really need python. I just already had a python script opening a tar file from a customer's system.

python method:

with open("path_to_file_with_header", 'r') as sysstat_file:
    first_line = sysstat_file.readline()
    # grabs the 4th column, as it starts counting at 0
    sysstat_date = first_line.split()[3]
    os.environ["LOGSTASH_SYSSTAT_DATE"] = sysstat_date
    # print to see the date is what we thought.
    print os.environ["LOGSTASH_SYSSTAT_DATE"]

bash method:

# grabs the 4th column as 0 is the entire string and 1 is the first column
export LOGSTASH_SYSSTAT_DATE="$(head -n 1 path_to_file_with_header  | awk '{print $4}')"
# echo to see that it outputs as expected
echo $LOGSTASH_SYSSTAT_DATE

Within the logstash.conf file do like this:

input {
    stdin {
    tags => [ "${LOGSTASH_TAG:null}"]
  }
}
filter {
  grok { match {"message" => "%{TIME:stat_time}%{GREEDYDATA:remaining_stat_message}"}}
  mutate {
  add_field => {
    "sys_log_timestamp" => "${LOGSTASH_SYSSTAT_DATE:null} %{stat_time}"
  }
  date {
  match => [ "sys_log_timestamp",
            "MM/dd/yy HH:mm:ss"]
  target => "@timestamp"
}
# If the date can't get parsed, I don't want it in my elastic search, so I just drop it.
if "_dateparsefailure" in [tags] {
    drop{}
}
}
output {
  stdout {
    codec => rubydebug
  }
  # elasticsearch {
  #   hosts => ["127.0.0.1:9200"]
  #   index => "${LOGSTASH_NAME:customer_investigation}"
  # }
}

Oh and lastly to enable reading a file once through as well as getting the environmental variables to show in logstash, run it like so:

logstash --allow-env -f logstash.conf < path_to_file_with_header

TaylorSanchez · September 20, 2016, 3:50pm

This still feels like the 'wrong' way to solve this problem, or at least not very proper ELK toolset. I would even prefer being able to set the environmental variable with logstash. I saw some solutions beforehand that included some ruby code within the logstash file that set a ruby variable, but I haven't dealt with much ruby and wasn't following how to repurpose their snippets for my scenario.

Mark_Dolan · September 20, 2016, 4:09pm

Ah I see what you mean. Likewise I wanted to avoid doing any pre-processing because I feel that logstash should be able to manage to process arbitrary input somehow. If it can't then there's something wrong with its design.

My own issue is here:

It seems to me that logstash struggles with anything that isn't a simple predictable log format of

timestamp followed by some data

but even my situation does have that format, just not quite in the way logstash wants it.

Thanks for the info, I'll look into using a cronned sed script or something to preprocess the data.

TaylorSanchez · September 20, 2016, 6:36pm

I am not certain if its showing up correctly for me as on the forum its all one line, but if I quote it, then it does have a header. Anyways assuming it has the header that starts with ZZZZ, and ends with 2016 you would likely want to use the multiline codec and so that it moves all lines without a ZZZZ onto the previous line that was ZZZZ and had a date. Then you can just write the regex to pull out the date on the single line.

input {
  stdin {
    codec => multiline {
      #compresses all lines between lines with ^ZZZZ into single line
      # the \n will still be present in the new 'single' line
      pattern => "^[^(ZZZZ)]|^$"
      what => "previous"
      }
  }
}
filter {
  mutate {
    #remove \n in message and replaces with NEWLINE
    gsub => ["message", "\n", "NEWLINE"]
  }
  grok {stuff}
}

The reason this doesn't work for my scenario is that the time stamp is changing with each line, just the date part is in the header.

TaylorSanchez · September 20, 2016, 8:59pm

For reference how my data looks where I needed to set an environmental variable.

sar -u -f sa04

Linux 2.6.32.54-0.79.TDC.1.R.0-default (WAITROSE-1-9) 	09/04/16 	_x86_64_

16:00:02        CPU      %usr     %nice      %sys   %iowait    %steal      %irq     %soft    %guest     %idle
16:05:01        all      0.02      0.00      0.05      0.02      0.00      0.03      0.06      0.00     99.83
16:10:01        all      0.01      0.00      0.04      0.01      0.00      0.03      0.06      0.00     99.85
08:40:01        all      0.01      0.00      0.02      0.03      0.00      0.03      0.05      0.00     99.86
08:45:01        all      0.02      0.00      0.05      0.01      0.00      0.03      0.06      0.00     99.82
08:50:01        all      0.00      0.00      0.02      0.01      0.00      0.03      0.05      0.00     99.88
08:55:01        all      0.16      0.00      0.12      0.04      0.00      0.03      0.06      0.00     99.59
09:00:02        all      0.01      0.00      0.02      0.01      0.00      0.03      0.05      0.00     99.87
09:05:01        all      0.02      0.00      0.04      0.02      0.00      0.03      0.06      0.00     99.84
09:10:01        all      0.01      0.00      0.03      0.02      0.00      0.03      0.05      0.00     99.86
09:15:01        all      0.03      0.00      0.07      0.02      0.00      0.03      0.06      0.00     99.79
09:20:01        all      0.01      0.00      0.02      0.01      0.00      0.03      0.05      0.00     99.87
Average:        all      0.03      0.00      0.06      0.05      0.00      0.03      0.06      0.00     99.78

Topic		Replies	Views
Grok for data Logstash	18	586	August 10, 2022
Yet another @timestamp log file trouble... but it should be easy! Logstash	34	11752	July 6, 2017
Getting date stamp from file properties Logstash	10	4673	July 6, 2017
Store temporary variable Logstash	3	667	April 8, 2020
Generate @timestamp in-logstash-by-concatenating-date-from-filename-and-time-from-logs Logstash	22	7449	June 13, 2017

How do I create a global date based on file header

Related topics