I need to detect any message that contains a word starting with “sap”, for example: sapxxx, sapxxx01, etc.
The word must start with “sap”.
So patterns like xxsapxx or xxxsap should not match.
Only sap* (letters or digits after “sap”) is valid.
I have tried many different regex options in Logstash but none of them worked correctly.
So, as I noted already, maybe it's not your regex that is the problem, maybe something else? It is always helpful to share as much information as possible.
This approach doesn’t work for me because I only have one input and one output, but many filters. In the filters, I intercept logs using conditions like if [host][ip] == "10.0.0.0" or [host][ip] == "192.168.0.0" or [host][ip] == "172.0.0.0" or …. The list is very large. Each device or application has its own filter. But this is inconvenient because every time a new host is added, I have to update the corresponding filter.
The filter with the largest number of if [host][ip] conditions is the one where I want to use matching based on the beginning of the word “sap”, because all hosts start with “sap…”. This will simplify adding new hosts without modifying the pipeline.
For reference, here is what the input pipeline looks like:
input {
tcp {
type => "syslog"
port => 1514
}
}
input {
udp {
type => "syslog"
port => 2514
}
}
And here is the output pipeline:
output {
if [type] == "syslog" {
if "gisap" in [tags] {
elasticsearch {
hosts => ["https://elastic01.avgust.com:9200", "https://elastic02.avgust.com:9200", "https://elastic03.avgust.com:9200", "https://elastic04.avgust.com:9200"]
index => "gisap-syslog-%{+dd.MM.YYYY}"
ssl => true
ssl_certificate_verification => true
cacert => "/etc/logstash/root.pem"
user => "xxx"
password => "xxx"
}
}
else if "gtsr" in [tags] {
elasticsearch {
hosts => ["https://elastic01.avgust.com:9200", "https://elastic02.avgust.com:9200", "https://elastic03.avgust.com:9200", "https://elastic04.avgust.com:9200"]
index => "gtsr-syslog-%{+dd.MM.YYYY}"
ssl => true
ssl_certificate_verification => true
cacert => "/etc/logstash/root.pem"
user => "xxx"
password => "xxx"
}
}
else {
elasticsearch {
hosts => ["https://elastic01.avgust.com:9200", "https://elastic02.avgust.com:9200", "https://elastic03.avgust.com:9200", "https://elastic04.avgust.com:9200"]
index => "logs-%{+dd.MM.YYYY}"
ssl => true
ssl_certificate_verification => true
cacert => "/etc/logstash/root.pem"
user => "xxx"
password => "xxx"
}
}
}
}
RainTown suggestion is working fine, it will search for sap string at any possition, check it.
Here is the sample which use grok to the syslog message, if you don't want field, remove it or use @metadata as temporary fields. Only difference is with grok will have sap checking on the right position.
input {
generator {
#message => '<30>Dec 5 15:01:05 192.168.0.0 systemd[1]: snapperd.service: Deactivated successfully.' # hostname is ip
#message => '<30>Dec 5 15:01:05 aphcmdevdb systemd[1]: snapperd.service: Deactivated successfully.' # hostname doesn't have sap*
#message => '<30>Dec 5 15:01:05 aaphcmdevdb systemd[1]: sapperd.service: Deactivated successfully.' # line has sap* on another place
message => '<30>Dec 5 15:01:05 saphcmdevdb systemd[1]: snapperd.service: Deactivated successfully.' # your sample
count => 1
}
}
filter{
# parse syslog line in @metadata or realfields
grok {
match => {"message" => "<%{INT}>(?:%{SYSLOGTIMESTAMP:[@metadata][timestamp]}|%{TIMESTAMP_ISO8601:[@metadata][timestamp]})(?: %{SYSLOGFACILITY})?(?: %{HOSTNAME:[@metadata][hostname]})?(?: %{SYSLOGPROG}:)? %{GREEDYDATA:[@metadata][msg]}" }
#overwrite => [ "message" ]
}
if [@metadata][hostname] =~ /^sap/ { # add (?i) in front sap, for case-insensitive search
mutate{add_field => { "[@metadata][saphost]" => true }}
}
else {mutate{add_field => { "[@metadata][saphost]" => false }} }
# what RainTown suggested
if [message] =~ /\bsap\w*/ {
mutate{add_field => { "[@metadata][sap]" => true }}
}
else {mutate{add_field => { "[@metadata][sap]" => false }} }
mutate{
remove_field => [ "event", "@version", "@timestamp", "process", "timestamp", "host"] # "message",
}
}
output {
stdout { codec => rubydebug{ metadata => true}}
}
Of course you can even simplify grok to search only hostname, but with this will parse all line +have validation.
I cannot use Grok parsing first because it would apply to other filters. My structure is that each filter is a separate file. Each filter file contains configuration for a specific device group. For example, hosts with IPs 192.168.0.1-192.168.0.10 are Linux servers, 192.168.0.11-192.168.0.20 are Windows servers, and so on. In all filters, I use a condition like if [host] =~ "sap*" or [host][ip] == "10.0.0.0" or [host][ip] == "10.0.0.1" or ... and so on. At the end of the filter, I add a tag field so that in Elasticsearch I can see which host group the logs came from.
Currently, my "host" field is an array that outputs both the hostname and IP. This is inconvenient. I'm trying to get the filter into the desired state. Could you please advise: if the "host" field in the index is already structured as an array, and I'm trying to apply a filter like:
filter { if [message] =~ /\bsap\w*/ { # your stuff here } }
So if you want to filter based on the name of the host name, than it is something really simple.
First you need to isolate the host name on a specific field, you can do that with a dissect filter, then you check the host name field to see if it starts with sap.
Something like this:
filter {
dissect {
match => {
"message" => "<%{}>%{} %{}:%{}%:{} %{[@metadata][hostname]} %{}: %{}"
}
}
if [@metadata][hostname] =~ /^sap/ {
your filters
}
}
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.