Converting very old logstash setup to a moden version, syntax issue pulling info from filebeat

Hello All

I'm fairly new at Logstash and I was tasked of taking our old beast of a Logstash server into the new age.

We use our Filebeat/Logstash setup for a central logging server and nothing more.

I've got 90% of the way there now with Filebeat 7.1.1 and logstash 7.1.1 (from Filebeat 1.3.1 and logstash 2.3.x) however I'm stumbling over the last hurdle and could use a little bit of help.

We had a set of filters on the logstash end that would filter the logs into folders
The folders sorted arranged the logs by type then by year-month/date/server/filename.

EG: /logs/squid/2019-06/20/proxy1/access.log

Where I'm struggling is the formatting has changed quite a bit, I got the year-month and day working but I can't seem to get the server name and filename working anyone, what I'm left with is an output like the below.

EG: /logs/squid/2019-06/20/%{host}/%filename

Which is no good at all.

Here is what we use to have:
OLD CODE!

ruby {
            code => "event['filename'] = event['source'].split('/').last"
        }
        ruby {
            code => "event['index_day'] = event.timestamp.time.localtime.strftime('%d-%m-%y')"
        }
        ruby {
            code => "event['index_month'] = event.timestamp.time.localtime.strftime('%Y-%m')"
        }
        ruby {
            code => "event['index_day_only'] = event.timestamp.time.localtime.strftime('%d')"
        }
output {
    if [type]  == "syslog" {
        file {
                path => "/logs/syslog/%{index_month}/%{index_day_only}/%{host}/%{filename}"
                codec => line { format => "%{[message]}"}
            }
    }
}

And here is the new code:
NEW CODE

ruby {
code=> "event.set('filename', event.get('source').split('/').last)"
#Below is the old code tried both neither worked
#code => "event['filename'] = event['source'].split('/').last"
}
output {
if [fields][document_type] == "syslogtest" {
file {
path => "/logs/syslog/%{+YYYY-MM}/%{+dd}/%{host}/%{filename}"
codec => line { format => "%{[message]}"}
}
}
}

I tried to do some googling but I just couldn't get there.

I'm know I'm close I'm just not really understand what I'm missing, any help would be amazing thank you in advance.

I've got an update.
I worked out how to at least grab the server name from the variables that filebeat sends
EG: path => "/logs/apache2/%{+YYYY-MM}/%{+dd}/%{[host][hostname]}/
Now give me my host name, that was fairly easy once I sat down and actually looked through the output.

However I am still stuck with getting the filename from filebeat (just the filename not the path+filename)

I've tried to adapt the grok and perl filters I found around the place but I'm doing somethign wrong, also all the examples used the old source field I can't find a single example that uses the %[log][file][path] field
Here is the filters I have Tried to adapt so far from older posts that I am failing to make work:

grok {
      match => { "%[log][file][path]" => "/var/log/apache2/.*?/(?<logfolder>.*?)/" }
        }

grok {
match => ["path","/var/log/apache2/%{DATA:filename}log"]
"source",
"/var/logs/%{DATA:myIndex}.json"
}

ruby {
code => "event.set('filename', event.get('file.path').split('/').last)"
}

none of these filters are working and I feel like I'm losing my mind, I know its something I am doing but as a new person to logstash I don't know enough to know what I'm doing wrong.

I think my issue is (outside of generally not being smart) is I'm not even sure if I'm meant to be pulling the path from:
%[log][file][path]
or
%[log][path]
or
%[file][path]

so I can't even begin to diagnose if its my filter or syntax that is causing the root of the issue.

Remove the % in the field name. It should just be [log][file][path]

In the Discover tab of Kibana, in the JSON tab, what is the text in the log.file.path field? That is, what is the complete file path?

1 Like

Hi Badger

Thank you for the help, the filter is no longer creating errors so that is a plus, however I'm still not sure how to translate that grok filter in my output so I get the filename in the path i'm generating (I might be misleading you here with the filter sorry as I mentioned I'm brand new to this)

I don't have Kibana setup on this system, its purely a filebeat and Logstash setup to allow centralised logging for auditors.

Basically my issue is right now I get all the logs from my server but its putting all the logs from the /var/log/apache2/*log into one file called %{path} which isn't ideal.

How our Setup works:
File comes in from server (proxy1) via filebeat sending to (Centerallogging1)
(Centralloggining1) filters the file via Logstash and send it out via the file output to a central stored area.

Here is a full dump of the full logstash code I'm working with.

input {
beats {
port => 5044
ssl => true
ssl_certificate => "/etc/logstash/certs/NAMEREMOVED.logstash.crt"
ssl_key => "/etc/logstash/certs/NAMEREMOVED.logstash.key"
}
}
filter {
grok {
match => { "[log][file][path]" => "/var/log/apache2/.?/(?.?)/" }
}
}
output {
if [fields][document_type] == "apache_logs" {
file {
path => "/logs/apache2/%{+YYYY-MM}/%{+dd}/%{[host][hostname]}/%{path}"
codec => line { format => "%{[message]}"}
}
}
}

That is the whole filter I have a few setup for syslogs and audit logs but they follow the exact same structure.
If needed here is the filebeat setup as well, again its pretty simple, the only difference is that the apache_logs grab multiple logs rather then 1 log:

filebeat.inputs:

  • type: log
    paths:
    • /var/log/syslog
      fields:
      document_type: syslog
  • type: log
    paths:
    • /var/log/auth.log
      fields:
      document_type: auth.log
  • type: log
    paths:
    • /var/log/apache2/log
      fields:
      document_type: apache_logs
      filebeat.config.modules:
      path: ${path.config}/modules.d/
      .yml
      reload.enabled: false
      setup.template.settings:
      index.number_of_shards: 1
      output.logstash:
      hosts: ["NAMEREMOVED:5044"]
      ssl.certificate_authorities: ["/etc/filebeat/NAMEREMOVED.logstash.crt"]
      processors:
    • add_host_metadata: ~
    • add_cloud_metadata: ~

I'm fairly sure I still don't have the path output correct , but I'm not sure what variable the grok filter is setting for my filename and how to put that in my output for the path.

Sorry if I'm not explaining it that well, I'm just a little unsure of the system as a whole.

If it helps here is what my

If you just want the filename from a UNIX path you could try

grok { match => { "message" => "/(?<filename>[^/]+)$" } }
1 Like

That got me much closer so thanks but not quite there.
Not exactly sure what is going on here but when I applied that grok filter, then used %{filename} as a path variable it produced the following outfile files:

-rw-r--r-- 1 logstash logstash 19140 Jun 26 10:51 access_log,1.1"
-rw-r--r-- 1 logstash logstash 37884 Jun 26 10:51 access_log,1.8.2"
-rw-r--r-- 1 logstash logstash 875 Jun 26 10:51 access_log,2019:10:48:52 +1000] "-" 408 - "-" - - - 0 "-"
-rw-r--r-- 1 logstash logstash 875 Jun 26 10:51 access_log,2019:10:49:52 +1000] "-" 408 - "-" - - - 0 "-"
-rw-r--r-- 1 logstash logstash 1794 Jun 26 10:51 access_log,html - 0 "-"
-rw-r--r-- 1 logstash logstash 45619 Jun 26 10:48 access_log,var

Its now grabbing the filename but also adding a comma and some junk to the file name resulting in multiple files with weird names that aren't on the filebeat side.

The output is still the same as before EG:

path => "/logs/apache2/%{+YYYY-MM}/%{+dd}/%{[host][hostname]}/%{filename}"

Here is a dump of the output with a couple of file names altered for security, this is from the log called apache.log,1.1:

{"log":{"file":{"path":"/var/log/apache2/access_log"},"offset":106939253},"@timestamp":"2019-06-26T00:53:58.168Z","fields":{"document_type":"apache_logs"},"@version":"1","input":{"type":"log"},"tags":["beats_input_codec_plain_applied"],"message":"IP_REMOVED - - [26/Jun/2019:10:53:53 +1000] "GET /server-status/?auto HTTP/1.1" 200 483 "-" "Go-http-client/1.1"","host":{"containerized":false,"name":"SERVER_NAME_REMOVED","id":"fdcd53e56cd745eba9b21d1028ded583","architecture":"x86_64","hostname":"SERVER_NAME_REMOVED","os":{"family":"debian","name":"Debian GNU/Linux","kernel":"3.16.0-4-amd64","codename":"jessie","platform":"debian","version":"8 (jessie)"}},"agent":{"id":"1509c8fc-1ca3-4ba1-a6be-a3134870fba2","version":"7.1.1","type":"filebeat","hostname":"SERVER_NAME_REMOVED","ephemeral_id":"a94ccd91-e7a7-4b70-923d-9e3a3aadfaac"},"filename":["access_log","1.1""],"ecs":{"version":"1.0.0"}}

Thanks again for all the help Badger, I'm sorry to keep asking questions but being brand new at this I'm struggling.

EDIT:
I got a lot closer myself but something is still slighty off.

I changed the grok filter to the below:

grok { match => { "[log][file][path]" => "/(?<filename>[^/]+)$" } }

with the output still going by the command:

path => "/logs/apache2/%{+YYYY-MM}/%{+dd}/%{[host][hostname]}/%{filename}"

Now each log is only going to 1 file per log, however the name is still a little off as its replicating the filename twice per file EG:

access_log,access_log -rw-r--r-- 1 logstash logstash 1097 Jun 26 11:06
portal.DOMAIN_REMOVED.access_log,portal.DOMAIN_REMOVED.access_log -rw-r--r-- 1 logstash logstash 5635 Jun 26 11:06
data.DOMAIN_REMOVED.access_log,data.DOMAIN_REMOVED.access_log

I have no idea why its spitting out the filename twice divided by a comma but if I can resolve that its done.

Got it working by playing around so thanks heaps Badger you pushed me in the right direction.
Leaving this here in a post in case anyone else is ever as silly as me and attempts this.

I started hacking off bits of the filter until I got what I needed.

The end result.

Grok filter:

grok { match => { "[log][file][path]" => "/(?<filename>)" } }

Output Path used:

path => "/logs/apache2/%{+YYYY-MM}/%{+dd}/%{[host][hostname]}/%{filename}"

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.