Windows filebeat to logstash to elasticsearch


(Richard Poole) #1

I'm fairly new to ELK, but not completely hopeless, I hope. I am running filebeat on a Windows server to collect tomcat8 access logs (eClinicalWorks on Windows). I got the grok pattern correct as far as any pattern tester goes, but I cannot ship from filebeat to ES because of some escape character issues (I have to add an extra \ to any escapes for filebeat to work, but they fail when they hit ES, but anyway). I have reconfigured to pipe through logstash, and everything is great except the filebeat data now gets indexed under logstash, which is not what I want. Config files follow:

filebeat.yml
filebeat.inputs:
- type: log
enabled: false
paths:
filebeat.config.modules:
path: {path.config}/modules.d/*.yml reload.enabled: true reload.period: 15s setup.template.settings: index.number_of_shards: 1 index.number_of_replicas: 1 setup.template.enabled: true setup.template.name: "filebeat" setup.template.pattern: "filebeat-*" setup.template.fields: "{path.config}/fields.yml"
tags: ["eCW", "tomcat","NLB"]
setup.kibana:
host: "kibana.domain.int:5601"
output.logstash:
hosts: ["pmr-ls1.domain.int:5045","pmr-ls2.domain.int:5045"]
loadbalance: true
index: "filebeat-%{YYYY.MM.dd}"
processors:
- add_host_metadata: ~

apache2.yml
- module: apache2
access:
enabled: true
var.paths: ["D:/eClinicalWorks/tomcat8/logs/jasper*.log"]
error:
enabled: false

logstash:
input {
beats {
port => 5045
type => file
tags => "neweCW"
}
}

filter {
  grok {
    match => {
      "message" => "%{IPORHOST:apache2.access.remote_ip} - %{NUMBER:apache2.access.time}     %{DATA:apache2.access.user_name} \[%{HTTPDATE:timestamp}\] %{WORD:apache2.access.request} %{DATA:apache2.access.url} HTTP/%{NUMBER:apache2.access.http_version} %{NUMBER:apache2.access.response_code} (?:%{NUMBER:apache2.access.body_sent.bytes}|-) %{DATA:apache2.access.referrer}"
    }
    overwrite => [ "message" ]
    remove_field => [ "ident", "auth" ]
  }
  geoip { source => "apache2.access.remote_ip" }
  mutate {
    gsub => [
        "request", "\?.+", "",
        "proxiedip", "(^\"|\"$)", "",
        "loginame", "(^\"|\"$)" , "",
        "referrer",  "(^\"|\"$)" , ""
    ]
  }
  mutate {
    convert => {
      "bytes" => "integer"
      "elapsed_millis" => "integer"
      "serverport" => "integer"
    }
  }
  mutate {
    remove_field => "host"
  }
  date {
    match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
  }
}

output {
  if "_grokparsefailure" not in [tags] {
    stdout {
      codec => rubydebug
    }
  }
  elasticsearch {
    manage_template => true
    hosts => ["pmr-es1.domain.int:9200","phy-es1.domain.int:9200"]
    index => ["filebeat-%{+YYYY.MM.dd}"]
  }
#   stdout {codec => rubydebug}
}

Some of the tags are for my tracking of changes to the filters (neweCW) , so yes I will have a bunch of junk tags.

Anything else I can provide? I could use a bit of help.

Thanks,

Rich


(Richard Poole) #2

And, a lot of that formatting didn't come through correctly. Sorry, my first post.


(Steffen Siering) #3

Your elasticsearch output configuration looks off. e.g. index should not be an array. Only enable manage_template => true if you have an actual template file from filebeat or you have a template representing your events correctly. Otherwise use filebeat setup --template to install the elasticsearch template.

For recommended Logstash configuration integration Beats->Logstash->Elasticsearch see: https://www.elastic.co/guide/en/elastic-stack-get-started/6.5/get-started-elastic-stack.html#logstash-setup


(Richard Poole) #4

Thank you! I thought I had that before the array, and just started adding crap to see if I could get it to do something different. It is working as expected now.

Rich


(Richard Poole) #5

Actually, now I have the logs going to filebeat and logstash. I have verified my logstash output specifies index => "filebeat-%{+YYYY.MM.dd}". I also verified the metadata in the output with stdout { codec => rubydebug { metadata => true } }, and the filebeat index is listed. Any ideas?

Thanks,
Rich


(Steffen Siering) #6

Any ideas?

About what. I was under the impression you solved it. What exactly do you observe? What would you expect?


(Richard Poole) #7

I thought by specifying the index as filebeat-*, the logs would go to filebeat, not logstash. Certainly I didn't think it would go to both. This is duplicating data at the rate of at least 2GB a day. How do I get data from filebeat through logstash to elasticsearch to be indexed into filebeat only?


(Steffen Siering) #8

Logstash itself is not magically duplicating events into 2 different indices. Do you use a configuration directory? Is there another elasticsearch output configuration? One can use pipelines in logstash, so to separate processing, but just having mulitple configurations files intermixes configurations.


(Richard Poole) #9

So any given log entry in question is only stored once but referenced by two indices?


(Steffen Siering) #10

If you can see the same document in 2 indices, it is stored twice. But Logstash by default does not store events in 2 indices. Elasticsearch bulk API doesn't even support this. In order to store an event in multiple indices you need to have multiple Elasticsearch outputs configured.

Which indices to you see the event stored in?

Is you Logstash configuration complete? Do you have other configs in your path?