I have the following Useragent filter plugin configuration in Logstash 6.3.2 which does work for my Apache logs, but not for Nginx or HAProxy logs. I have double checked that the referenced field names are correct.
My Apache events are tagged as "apache_access" and have the user agent string set in "agent". My Nginx events are tagged as "nginx_access" and have the user agent string set in "http.agent", and my HAProxy events tagged as "haproxy" are using the exact same field "http.agent".
Why is the config below only working for Apache logs, and not working for the "[http][agent]" source? Am I making a mistake in the filter plugin configuration? Is this a bug?
filter {
if "apache_access" in [tags] {
useragent {
source => "agent"
target => "useragent"
}
}
if "nginx_access" in [tags] or "haproxy" in [tags] {
useragent {
source => "[http][agent]"
target => "useragent"
}
}
}
In case of HAproxy logs we use a Grok pattern, like this:
%{DATA:http.agent}
This looks indeed like field names with a dot in the name. Is that even possible? So, these are not nested fields, but simply fields with dots in their name?
Alright. So I need to refactor some of the Logstash filters to use proper nested fields, i.e. rename => { "[json][agent]" => "[http][agent]" }. Apparently Grok doesn't support nested fields, so we'll need to find alternatives for that as well.
Will this conflict with existing Elasticsearch indexes? Or are fields with dots in their name already converted to nested fields in Elasticsearch?
Instead of doing a rename of every field, you could look at the de_dot filter using the nested option. Do note the caveats in the documentation however.
grok filters do support the nested field notation using [outer][inner]
Fields with dots in the name are indexed as such, they are not converted by elasticsearch.
Does it make any difference in performance when using nested fields in Elasticsearch compared to flat fields? We were simply using "nested fields" to keep fields a bit more organized in ES, i.e. all HTTP-related data in "http.subname" fields. Does it make sense to you to use nested fields for this use case?
It makes sense to me to use nested fields like that. I cannot speak to the performance question. You might want to ask in the elasticsearch category if there is any performance impact from nesting fields.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.