Logstash Filter incoming data

Hello Everyone, I hope you are all doing well.
I am sorry to come here with this question but I could really use some experts help to solve an issue I am facing with Logstash.

I will explain my situation as in detail as I can and if I am missing any bits, just please let me know and I will be happy to provide all the answers.

I have a windows virtual machine where I did install log stash,elasticsearch and Grafana.

on this virtual machine I am receiving tons of logs on port 3014, with out of almost 200 column I need only 4/5 columns. Those are the field I am receiving.

@timestamp
message
@version
_id
_index
_source
_type
ad.EventRecordID
ad.Opcode
ad.ProcessID
ad.ThreadID
ad.Version
ad.agentZoneName
ad.analyzedBy
ad.CustomerName
ad.destinationHosts

Just to don't listen all, I will stop on those fields.

What I want to achieve, is to use Logstash to discard all the field that I don't want to see at all.
For example, out of the fields I posted above, I would like to visualise only the following:

@timestamp
ad.customerName
ad.destinationHosts

while all the other I want log stash to discard them.
This approach will help me a lot as I won't have to store and pay for data that is just junk to me (I am hosting data explorer and this VM on azure so I will have to pay for data ingestion etc)

So far I have this log stash config file which it listen and process every incoming data through port 3014

input {
  syslog {
    port => 3014
    codec => cef
    syslog_field => "syslog"
    grok_pattern => "<%{POSINT:priority}>%{SYSLOGTIMESTAMP:timestamp}"
 }
}
output {
  elasticsearch {
     hosts => ["localhost:9200"]
         index => "logstash_index"
 }
}

later on I will change the output as I will be using kusto-pluging

For now I am trying to sort out the input and the filtering.

I am really sorry to bother with this basic stuff, but I love ELK but I am totally new to this and I don't know from where to start.

Thank you very much for your time, patience and help

There is a perfect filter for logstash to do this called prune. This filter will delete any field that is not part of the whitelist.

filter {
      prune {
        whitelist_names => [ "@timestamp","ad.customerName","ad.destinationHosts","tags","@metadata" ]
      }
    }

I added tags and @metadata as well as those can be useful and you don't necessarily want them to be excluded (but they can be)

Thank you so so so much for your reply. Just to make sure I fully understand.
The Logstash config file should look like this now:

input {
  syslog {
    port => 3014
    codec => cef
    syslog_field => "syslog"
    grok_pattern => "<%{POSINT:priority}>%{SYSLOGTIMESTAMP:timestamp}"
 }
filter {
      prune {
        whitelist_names =>["@timestamp","ad.customerName","ad.destinationHosts","tags","@metadata" ]
      }
    }
}
output {
  elasticsearch {
     hosts => ["localhost:9200"]
         index => "logstash_index"
 }
}

The prune filter will totally remove all the fields that are not whitelisted with no chance to recover them in the future, right?

That is correct! Give that a try with some test data and look at the output in a

stdout {
     codec => rubydebug
}

and verify that it's correct.

input {
  syslog {
    port => 3014
    codec => cef
    syslog_field => "syslog"
    grok_pattern => "<%{POSINT:priority}>%{SYSLOGTIMESTAMP:timestamp}"
 }
filter {
      prune {
        whitelist_names =>["@timestamp","ad.customerName","ad.destinationHosts","tags","@metadata" ]
      }
    }
}
output {
  elasticsearch {
     hosts => ["localhost:9200"]
         index => "logstash_index"
 }
  stdout {
     codec => rubydebug
 }
}

Thank you so much. I will try it soon and get back to you :slight_smile:

Great! If it works please don't forget to mark this discussion as solved to help other people as well.

I am here again, sorry to bother you. I am ready to lunch the configuration when a question posed up in my mind.

As in the past months I have been using the default configuration and collecting syslog (without storing them anywhere) if I use the prune, there is a chance that it might delete the old logs that I have or its safe as it will target only the new logs coming in?

Prune will only affect any new data being ingested into the pipeline. It will not affect any data that you have already stored.

As always, be sure to test your configurations first.