Filtering fields from Elasticsearch output

jrandombob · February 14, 2019, 3:21am

I'm working on a project which involves processing a large number of files which follow almost the same format (several billion records).

I have a Grok filter working to extract the details I'm interested in, the tricky part is that I want to spit those lines that failed to be parsed by the Grok filter out to a file which in a separate location called failed-[original file name].

I have all of that working fine, but the problem I've run into is that the ElasticSearch output plugin is emitting the "filename" field I use to name the target file, and I don't want or need that data (or the overhead it presents) in my index.

Is there a way to suppress a field I'm generating with a Grok pattern from being emitted in ElasticSearch output whilst still leaving it available for the output plugin itself?

I did try setting my ES mapping to "dynamic": "strict" but rather than just emitting the fields I have in the mapping I get an exception thrown because the filename field is not part of the mapping.

The pipeline config is as follows;

input {
    file {
<snip>
    }
}

filter {
	# ignore empty lines
    if [message] =~ /^\s*$/ {
        drop{}
    }
	# match against custom pattern
    grok {
        patterns_dir => ["/etc/logstash/patterns"]
        patterns_files_glob => "*"
        match => { "message" => "%{CUSTOMPATTERN}" }
    }
	# get original filename
    grok {
        match => { "path" => "%{GREEDYDATA}/%{GREEDYDATA:filename}" }
    }
	# get batch name
    grok {
        match => { "path" => "%{GREEDYDATA}/%{GREEDYDATA:batch_name}/%{GREEDYDATA}\.%{GREEDYDATA}" }
    }
	# generate fingerprint for id
    fingerprint {
        key => "XXXXXXX"
        method => "SHA256"
        source => ["fielda", "fieldb"]
        target => "[@metadata][generated_id]"
    }
	# strip extraneous fields
    mutate {
        remove_field => ["@version", "@timestamp", "path", "message", "host"]
    }
	# if we're in the watch directory we don't have a batch name
    if [batch_name] == "watch" {
        mutate {
            remove_field => ["batch_name"]
        }
    }
}

output {
    if [fielda] and [fieldb] {
        elasticsearch {
            index => "indexa"
            hosts => ["XXXX:9200"]
            document_id => "%{[@metadata][generated_id]}"
	    # deprecated but LS complains if we don't have it
            document_type => "customtype"
        }
    } else {
        if [batch_name] {
            file {
                path => "/opt/ingestion/failed/%{batch_name}/failed-%{filename}"
                dir_mode => 0775
                file_mode => 0664
                codec => line { format => "%{message}" }
            }
        } else {
            file {
                path => "/opt/ingestion/failed/failed-%{filename}"
                dir_mode => 0775
                file_mode => 0664
                codec => line { format => "%{message}" }
	    }
        }
    }
}

jrandombob · February 14, 2019, 3:47am

And of course, after having posted the question I immediately came upon the solution.

The @metadata fieldedit

In Logstash 1.5 and later, there is a special field called @metadata . The contents of @metadata will not be part of any of your events at output time, which makes it great to use for conditionals, or extending and building event fields with field reference and sprintf formatting.

So if we use [@metadata][filename] rather than filename we get the desired result.

system · March 14, 2019, 3:51am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Selective fields per output Logstash	2	1809	July 6, 2017
Elasticsearch output filter Logstash	2	387	December 9, 2016
Filter input data from Filebeat using logstash? Logstash	10	894	January 30, 2023
Add filter to send specific fields to elasticsearch Logstash	12	1787	March 12, 2019
How to remove fields in logstash output Logstash	8	14054	May 8, 2019

Filtering fields from Elasticsearch output

The @metadata fieldedit

Related topics