Missing core fields in index


(Ejether) #1

I'm having an odd issue where some of my log events getting shipped are missing the normal fields like beats.hostname, source etc. I'm not sure where to start to troubleshoot.

I have a mix of Cento6 and Centos7 all running filebeat 1.3.1 shipping to (currently) a single server running Logstash, Elasticsearch and Kibana in docker containers.

Any ideas on where to start would be much appreciated.

my filebeat conf:

filebeat:
prospectors:
- paths:
- /var/log//_perf.log
encoding: plain
fields_under_root: false
input_type: log
document_type: perf
scan_frequency: 10s
harvester_buffer_size: 16384
tail_files: false
force_close_files: false
backoff: 1s
max_backoff: 10s
backoff_factor: 2
partial_line_waiting: 5s
max_bytes: 10485760

and logstash config (erb template):


input {
  beats {
    port => <%= @beats_port %>
    codec => multiline {
      pattern => '^[0-9]{4}-[0-9]{2}-[0-9]{2}'
      negate => "true"
      what => "previous"
    }
  }
}

filter {  
  ruby {
    code => "
        fields = event['message'].scan(/\S*=\S*/)
        for field in fields
          if field.include? '='
            field = field.split('=')
            if !field[0].nil? && !field[1].nil?
              field[0] = field[0].gsub('.','_')
              if field[1].delete('ms').to_i.to_s == field[1].delete('ms') && field[0] != 'Event'
                event[field[0]] = field[1].delete('ms').to_s
              else
                event[field[0]] = field[1].to_s.delete(',')
              end
            end
          end
        end
      "
  }
  #grok { match => {"message" => "%{TIMESTAMP_ISO8601:timestamp}"}}
  #date { match => {"[@metadata][timestamp]" => "yyyy-MM-dd HH:mm:ss.SSS"}}
  mutate { convert => {"FasaID" => "string" "ItemsProcessed" => "integer" "Total" => "integer" "count" => "integer"}}
}

output {
  elasticsearch {
    hosts => <%= @elasticsearch_hosts %>
    #manage_template => false
    #index => \"%{[@metadata][beat]}-%{+YYYY.MM.dd}\"
    #document_type => \"%{[@metadata][type]}\"
  }
  #stdout { codec => rubydebug { metadata => "true" }}
}

(Andrew Kroh) #2

I recommend moving the multiline processing to Filebeat so that you don't need the multiline codec in Logstash.


(Ejether) #3

Do you think the multiline codec is causing this issue?


(ruflin) #4

That is kind of the assumption as filebeat itself sends all events with this meta data. And as on the Logstash side some transformation happens, I would expect that it is removed there.


(Ejether) #5

It seems to have helped. I' not getting any more documents stacking up missing the source field. What was interesting is that it wasn't all documents or all documents from a particular prospector. It was seemingly random.

Do best practices dictate doing most of the log processing in filebeat instead of logstash?


(ruflin) #6

TBH I'm still confused about the random part and can't really explain it. Do you have multiple LS instances?

About the multiline and best practices: As usual, it depends.

  • If you want to use as few resources as possible on the edge nodes, processing should be done in LS or ES
  • If you load balance between multiple LS instances, it can happen the events arrive at different nodes and can't be properly combined
  • LS can to much more then only multiline, so if you do additional processing, it could be simpler to do everything in one place.

It really depends on your use case.


(Ejether) #7

We currently have single LS instance.
As you say, I wanted to keep the config centralized, so I was putting all the processing in the LS config.

Anyway, Thanks for the tip.
Its being going strong without dropping metadata for a couple of days.


(system) #8

This topic was automatically closed after 21 days. New replies are no longer allowed.