How to filter data with Logstash before storing parsed data in Elasticsearch

Han_Xu · February 19, 2019, 4:50pm

I understand that Logstash is for aggregating and processing logs. I have NGIX logs and had Logstash config setup as:

filter {
 grok {
   match => [ "message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}"]
   overwrite => [ "message" ]
 }
 mutate {
   convert => ["response", "integer"]
   convert => ["bytes", "integer"]
   convert => ["responsetime", "float"]
 }
 geoip {
   source => "clientip"
   target => "geoip"
   add_tag => [ "nginx-geoip" ]
 }
 date {
   match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
   remove_field => [ "timestamp" ]
 }
 useragent {
   source => "agent"
 }
}

output {
 elasticsearch {
   hosts => ["localhost:9200"]
   index => "weblogs-%{+YYYY.MM}"
   document_type => "nginx_logs"
 }
 stdout { codec => rubydebug }
}

This would parse the unstructured logs into a structured form of data, and store the data into monthly indexes.

What I discovered is that the majority of logs were contributed by robots/web-crawlers. In python I would filter them out by:

browser_names = browser_names[~browser_names.str.\
                              match('^[\w\W]*(google|bot|spider|crawl|headless)[\w\W]*, na=False)]

However, I would like to filter them out with Logstash so I can save a lot of disk space in Elasticsearch server. Is there a way to do that? Thanks in advance!

Badger · February 19, 2019, 5:40pm

If there is a browser_names field on the event then something like

if [browser_names] =~ /^[\w\W]*(google|bot|spider|crawl|headless)[\w\W]/ {
    drop {}
}

Han_Xu · February 19, 2019, 8:02pm

Thx so much! It works like a charm

system · March 19, 2019, 8:02pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can someone please help me with nginx logstash filter Logstash	11	3349	September 30, 2017
Prevent some logs from going to the elastic search Elasticsearch	2	358	July 6, 2017
How to parse particular text string from logs and send to index? Logstash	12	3833	May 26, 2017
Logstash filter section Logstash	2	365	May 13, 2019
Ingest Logstash logs Logstash Logstash	1	695	July 6, 2017

How to filter data with Logstash before storing parsed data in Elasticsearch

Related topics