Adding/configuring available visualization fields

All,

I am new to ELK and have been reading and watching docs and videos for the last week. I have everything (Elasticsearch, Logstash, Kibana, filebeat) installed and have a dev apache server sending access and error logs --> logstash --> Elasticsearch. I have watched a couple kibana visualization videos but in my environment I don't have nearly as many Field -> strings available as in the videos (sorry if I'm mis-naming these 'objects' as I'm still learning). I am assuming I need to define/configure additional strings but do not know how to do so.

Here is my filebeat.yml:

filebeat:
prospectors:
-
paths:
- /var/www/logs/access_log

  input_type: log
  document_type: apache
- 
  paths:
    - /var/www/logs/error_log
  input_type: log
  document_type: apache
  include_lines: ["error"]

registry_file: /var/lib/filebeat/registry

output:

logstash:
hosts: ["logstash:5044"]
index: dev-apache

logging:

to_files: true

files:
path: /var/log
name: filebeat
rotateeverybytes: 10485760 # = 10MB
keepfiles: 7

level: info

Here is my logstash beats.conf:

input {
beats {
port => 5044
}
}

filter {

grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}

date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}

mutate {
convert => ["response", "integer"]
convert => ["bytes", "integer"]
convert => ["responsetime", "float"]
}

useragent {
source => "agent"
}

}

output {

elasticsearch {
hosts => "localhost:9200"
manage_template => false
index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[@metadata][type]}"
}

}

Ideally what I am interested in are http responses (400's, 500's) and errors but I am still trying to get up to speed on the whole process.

Any guidance or direction on how to allow for more choices for visualizations is greatly appreciated.

TIA,

Herb

Lets start by figuring out what indices are being created in Elasticsearch. For this can you call the GET http://localhost:9200/_cat/indices API and paste the response here?

@shaunak, thank you for the response. Here is what I receive:

GET http://localhost:9200/_cat/indices

yellow open dev-webgate-apache-2016.08.27 5 1 2592 0 1.1mb 1.1mb
yellow open dev-webgate-apache-2016.08.26 5 1 67728 0 12.1mb 12.1mb
yellow open dev-webgate-apache-2016.08.29 5 1 34704 0 7.8mb 7.8mb
yellow open dev-webgate-apache-2016.08.28 5 1 2593 0 1mb 1mb
yellow open .kibana 1 1 2 0 8.1kb 8.1kb
yellow open dev-webgate-apache-2016.08.25 5 1 762 0 332.9kb 332.9kb
yellow open dev-webgate-apache-2016.09.01 5 1 33392 0 8.9mb 8.9mb
yellow open dev-webgate-apache-2016.08.31 5 1 60359 0 12mb 12mb
yellow open dev-webgate-apache-2016.08.30 5 1 77008 0 14.2mb 14.2mb

Interesting. In your filebeat.yml you have the index name set to "dev-apache". So I would've expected to see indices with names like dev-apache-YYYY.MM.DD in Elasticsearch.

Would the logstash output block cause this inconsistency?:

output {

elasticsearch {
hosts => "localhost:9200"
manage_template => false
index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[@metadata][type]}"
}

I am sending filebeat --> logstash --> elasticsearch..

No, your Logstash output block looks good to me. Essentially %{[metadata][beat]} will get replaced by the value of index from your filebeat.yml, which is dev-apache. So I'm not sure why you aren't seeing indices named dev-apache-YYYY.MM.DD in Elasticsearch.

Can you double check your filebeat.yml and make sure the the value of index there is dev-apache and not dev-webgate-apache?

Actually, now that I think about it, I have been testing quite a bit. I deleted the indices a couple times and I believe changed the index name in filebeat.yml after deleting them. Could it be that the metadata from the previous index name in filebeat.yml is causing the difference?

Sorry @shaunak. I must have made the change after original post and before the GET post. Yes, name is dev-webgate-apache in current filebeat.yml:

filebeat:
prospectors:
-
paths:
- /var/www/logs/access_log

  input_type: log
  document_type: apache
-
  paths:
    - /var/www/logs/error_log
  input_type: log
  document_type: apache
  include_lines: ["error"]

registry_file: /var/lib/filebeat/registry

output:

logstash:
hosts: ["logstash:5044"]
index: dev-webgate-apache

logging:

to_files: true

files:
path: /var/log
name: filebeat
rotateeverybytes: 10485760 # = 10MB
keepfiles: 7

level: info

Alright, so now that the index names all line up, we can be reasonably confident that the data in the dev-webgate-apache-* indices in Elasticsearch is coming from filebeat.

Next, lets see what fields are available to us in Kibana. In Kibana, go to Settings > Indices and create an index pattern named dev-webgate-apache-*. Make sure to check the "Index contains time-based events" checkbox and choose a time field from the select list. If you already have this index pattern setup, would you mind deleting it and setting it up again, as I described? Once you've done that, what fields do you see in Kibana on the index pattern page?

I see 17 fields:

tags
host
count
_source
input_type
_index
type
@version
message
@timestamp
source
beat.hostname
offset
_id
_type
_score

Sorry, missed:

beat.name

Okay, based on the fields you are seeing it appears that the grok filter is not actually parsing the message field into the various combined apache log fields. If it was we would've seen fields such as response, verb, bytes, etc.

Lets try to confirm this. In your Logstash configuration file, output section, could you add the following output plugin:

stdout {
    codec => rubydebug
}

And then restart Logstash. This will cause the parsed events to be indexed into Elasticsearch, as before, but also be output to your console so we can debug them. Check this output for response, verb, etc. fields.

Thanks @shaunak.

I added the stdout block, incidentally it does not output to the console but rather to a log logstash.stdout (not sure if that is unexpected).

I do see a "_grokparsefailure":

"message" => "10.10.1.254 - - [01/Sep/2016:14:52:19 -0700] "GET /Handler?q=update&processedUsers=&failedUsers=&qid=147&qpd=fwwPQQybvOGW2PEr2TRqWw%3D%3D HTTP/1.1" 200 111",
"@version" => "1",
"@timestamp" => "2016-09-01T21:52:21.586Z",
"beat" => {
"hostname" => "hostname.biz",
"name" => "hostname.biz"
},
"source" => "/path/to/access_log",
"offset" => 32467684,
"type" => "apache",
"input_type" => "log",
"count" => 1,
"fields" => nil,
"host" => "hostname.biz",
"tags" => [
[0] "beats_input_codec_plain_applied",
[1] "_grokparsefailure"

Hmmm... at this point I think it'd be better if someone more familiar with Logstash than me took a look at this :slight_smile: So I'm going to move this post to the Logstash category for now.

Ok. @shaunak, thanks for your help..

"message" => "10.10.1.254 - - [01/Sep/2016:14:52:19 -0700] "GET /Handler?q=update&processedUsers=&failedUsers=&qid=147&qpd=fwwPQQybvOGW2PEr2TRqWw%3D%3D HTTP/1.1" 200 111",

That's not a combined HTTP log file so your use of COMBINEDAPACHELOG in your grok filter is incorrect. You should have better luck with COMMONAPACHELOG.

@magnusbaeck, thank you for the reply, it was very helpful.

I have been able to adjust the logging on the apache server as well as test between COMBINEDAPACHELOG and COMMONAPACHELOG with good success.

I am very interested in learning how I can define fields regardless of the type of logging information that is sent to logstash. Is there a concept of white space separated fields much like an: awk '{print $1, $2, $3, $4}' that can be used for identifying incoming log info? I surely don't want to reinvent the wheel but may want to customize different incoming information.

One example is what I experienced with the above COMBINEDAPACHELOG vs. COMMONAPACHELOG. I was trying to use the duration flag "%D" in the apache log (which I would like to use):

LogFormat "%h %l %u %t "%r" %>s %b %D "%{Referer}i" "%{User-Agent}i"" combined

But by using the "%D" in the log, it appears to throw off the COMBINEDAPACHELOG in the grok filter. Without the "%D" the COMBINEDAPACHELOG appears to work fine.

Any guidance or direction to documentation would be greatly appreciated.

Thanks,

HB

I am very interested in learning how I can define fields regardless of the type of logging information that is sent to logstash. Is there a concept of white space separated fields much like an: awk '{print $1, $2, $3, $4}' that can be used for identifying incoming log info?

There's a csv filter that you can use for most kinds of data that's separated by a fixed token.

But by using the "%D" in the log, it appears to throw off the COMBINEDAPACHELOG in the grok filter. Without the "%D" the COMBINEDAPACHELOG appears to work fine.

Yes, of course. Adding something in the middle will cause the regular expression to no longer match. What you can do is put the %D at the end and make an adjustment of the grok expression to something like this:

%{COMBINEDAPACHELOG} %{INT:duration:int}

What I like even better is supporting a key/value pair list after the standard combined pattern; instead of just %D at the end of the log format, say duration=%D and use a kv filter to parse this. Extract everything after COMBINEDAPACHELOG to a separate field,

%{COMBINEDAPACHELOG} %{GREEDYDATA:kv}

and then use the kv filter to parse it:

kv {
  source => "kv"
  remove_field => ["kv"]
}

With this you can add new fields in the Apache configuration without having to change your Logstash configuration all the time. (In reality you'll want to use a Logstash filter to convert extracted numerical values from strings to integers or floats.)

@magnusbaeck thanks again.

This information will allow me to do what I need to do. I appreciate the guidance.

HB