I am new to ELK and have been reading and watching docs and videos for the last week. I have everything (Elasticsearch, Logstash, Kibana, filebeat) installed and have a dev apache server sending access and error logs --> logstash --> Elasticsearch. I have watched a couple kibana visualization videos but in my environment I don't have nearly as many Field -> strings available as in the videos (sorry if I'm mis-naming these 'objects' as I'm still learning). I am assuming I need to define/configure additional strings but do not know how to do so.
Lets start by figuring out what indices are being created in Elasticsearch. For this can you call the GET http://localhost:9200/_cat/indices API and paste the response here?
Interesting. In your filebeat.yml you have the index name set to "dev-apache". So I would've expected to see indices with names like dev-apache-YYYY.MM.DD in Elasticsearch.
No, your Logstash output block looks good to me. Essentially %{[metadata][beat]} will get replaced by the value of index from your filebeat.yml, which is dev-apache. So I'm not sure why you aren't seeing indices named dev-apache-YYYY.MM.DD in Elasticsearch.
Can you double check your filebeat.yml and make sure the the value of index there is dev-apache and not dev-webgate-apache?
Actually, now that I think about it, I have been testing quite a bit. I deleted the indices a couple times and I believe changed the index name in filebeat.yml after deleting them. Could it be that the metadata from the previous index name in filebeat.yml is causing the difference?
Alright, so now that the index names all line up, we can be reasonably confident that the data in the dev-webgate-apache-* indices in Elasticsearch is coming from filebeat.
Next, lets see what fields are available to us in Kibana. In Kibana, go to Settings > Indices and create an index pattern named dev-webgate-apache-*. Make sure to check the "Index contains time-based events" checkbox and choose a time field from the select list. If you already have this index pattern setup, would you mind deleting it and setting it up again, as I described? Once you've done that, what fields do you see in Kibana on the index pattern page?
Okay, based on the fields you are seeing it appears that the grok filter is not actually parsing the message field into the various combined apache log fields. If it was we would've seen fields such as response, verb, bytes, etc.
Lets try to confirm this. In your Logstash configuration file, output section, could you add the following output plugin:
stdout {
codec => rubydebug
}
And then restart Logstash. This will cause the parsed events to be indexed into Elasticsearch, as before, but also be output to your console so we can debug them. Check this output for response, verb, etc. fields.
Hmmm... at this point I think it'd be better if someone more familiar with Logstash than me took a look at this So I'm going to move this post to the Logstash category for now.
That's not a combined HTTP log file so your use of COMBINEDAPACHELOG in your grok filter is incorrect. You should have better luck with COMMONAPACHELOG.
@magnusbaeck, thank you for the reply, it was very helpful.
I have been able to adjust the logging on the apache server as well as test between COMBINEDAPACHELOG and COMMONAPACHELOG with good success.
I am very interested in learning how I can define fields regardless of the type of logging information that is sent to logstash. Is there a concept of white space separated fields much like an: awk '{print $1, $2, $3, $4}' that can be used for identifying incoming log info? I surely don't want to reinvent the wheel but may want to customize different incoming information.
One example is what I experienced with the above COMBINEDAPACHELOG vs. COMMONAPACHELOG. I was trying to use the duration flag "%D" in the apache log (which I would like to use):
But by using the "%D" in the log, it appears to throw off the COMBINEDAPACHELOG in the grok filter. Without the "%D" the COMBINEDAPACHELOG appears to work fine.
Any guidance or direction to documentation would be greatly appreciated.
I am very interested in learning how I can define fields regardless of the type of logging information that is sent to logstash. Is there a concept of white space separated fields much like an: awk '{print $1, $2, $3, $4}' that can be used for identifying incoming log info?
There's a csv filter that you can use for most kinds of data that's separated by a fixed token.
But by using the "%D" in the log, it appears to throw off the COMBINEDAPACHELOG in the grok filter. Without the "%D" the COMBINEDAPACHELOG appears to work fine.
Yes, of course. Adding something in the middle will cause the regular expression to no longer match. What you can do is put the %D at the end and make an adjustment of the grok expression to something like this:
%{COMBINEDAPACHELOG} %{INT:duration:int}
What I like even better is supporting a key/value pair list after the standard combined pattern; instead of just %D at the end of the log format, say duration=%D and use a kv filter to parse this. Extract everything after COMBINEDAPACHELOG to a separate field,
%{COMBINEDAPACHELOG} %{GREEDYDATA:kv}
and then use the kv filter to parse it:
kv {
source => "kv"
remove_field => ["kv"]
}
With this you can add new fields in the Apache configuration without having to change your Logstash configuration all the time. (In reality you'll want to use a Logstash filter to convert extracted numerical values from strings to integers or floats.)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.