I'm working on a project which involves processing a large number of files which follow almost the same format (several billion records).
I have a Grok filter working to extract the details I'm interested in, the tricky part is that I want to spit those lines that failed to be parsed by the Grok filter out to a file which in a separate location called failed-[original file name].
I have all of that working fine, but the problem I've run into is that the ElasticSearch output plugin is emitting the "filename" field I use to name the target file, and I don't want or need that data (or the overhead it presents) in my index.
Is there a way to suppress a field I'm generating with a Grok pattern from being emitted in ElasticSearch output whilst still leaving it available for the output plugin itself?
I did try setting my ES mapping to "dynamic": "strict" but rather than just emitting the fields I have in the mapping I get an exception thrown because the filename field is not part of the mapping.
The pipeline config is as follows;
input {
file {
<snip>
}
}
filter {
# ignore empty lines
if [message] =~ /^\s*$/ {
drop{}
}
# match against custom pattern
grok {
patterns_dir => ["/etc/logstash/patterns"]
patterns_files_glob => "*"
match => { "message" => "%{CUSTOMPATTERN}" }
}
# get original filename
grok {
match => { "path" => "%{GREEDYDATA}/%{GREEDYDATA:filename}" }
}
# get batch name
grok {
match => { "path" => "%{GREEDYDATA}/%{GREEDYDATA:batch_name}/%{GREEDYDATA}\.%{GREEDYDATA}" }
}
# generate fingerprint for id
fingerprint {
key => "XXXXXXX"
method => "SHA256"
source => ["fielda", "fieldb"]
target => "[@metadata][generated_id]"
}
# strip extraneous fields
mutate {
remove_field => ["@version", "@timestamp", "path", "message", "host"]
}
# if we're in the watch directory we don't have a batch name
if [batch_name] == "watch" {
mutate {
remove_field => ["batch_name"]
}
}
}
output {
if [fielda] and [fieldb] {
elasticsearch {
index => "indexa"
hosts => ["XXXX:9200"]
document_id => "%{[@metadata][generated_id]}"
# deprecated but LS complains if we don't have it
document_type => "customtype"
}
} else {
if [batch_name] {
file {
path => "/opt/ingestion/failed/%{batch_name}/failed-%{filename}"
dir_mode => 0775
file_mode => 0664
codec => line { format => "%{message}" }
}
} else {
file {
path => "/opt/ingestion/failed/failed-%{filename}"
dir_mode => 0775
file_mode => 0664
codec => line { format => "%{message}" }
}
}
}
}