I've hunted high and low through the documentation (written; not all of us have the luxury of watching videos) for information on this. In short, I have a prototype monitoring system using the elk-docker container (sebp/elk) running Elasticsearch/LogStash/Kibana, with data coming in from rsyslog and collectd.
So far so good, but on the Elastic Search index, I'm a little lost. The logstash output plug-in configuration is as follows:
Now, that was found by experiment: the documentation says the following:
index
Value type is string
Default value is "logstash-%{+YYYY.MM.dd}"
The index to write events to. This can be dynamic using the %{foo} syntax.
The default value will partition your indices by day so you can more easily
delete old data or only search specific date ranges.
Indexes may not contain uppercase characters.
For weekly indexes ISO 8601 format is recommended, eg. logstash-%{+xxxx.ww}
I understand the concept that %{foo} can stand for a number of things. Through experiment I learned that %{host} and %{sysloghost} work. However the latter only works for things via rsyslog and the former is sometimes an IP address. I haven't figured out why it's inconsistent regarding IP vs hostname, but I'd prefer hostnames if at all possible. A compromise might be to use %{sysloghost} if available, or revert back to %{host}. In Bourne shell syntax, that would be ${sysloghost:?${host}}, I have no idea if LogStash/Elasticsearch support anything like this, as I'm yet to find a document that accurately describes the syntax and variables.
So, some queries:
Is there a document that formally defines the variables and syntax used for that formatting string?
Is there a reason why some logging messages are picking up the IP address some times and the hostname the other?
Is there a document that formally defines the variables and syntax used for that formatting string?
There's no field documentation since, apart from @timestamp and tags, there aren't really any standardized fields.
Is there a reason why some logging messages are picking up the IP address some times and the hostname the other?
That depends on the source data and the filters used. With the dns filter you can perform DNS lookups (forward and reverse) if you want to consistently store hostnames (or both hostnames and IP addresses).
Fair enough… it's just difficult navigating my way through the maze at the moment. Coming at this system for the first time, and I can see the power in this, but it's bewildering.
For the input, it pretty much is coming from two sources:
rsyslog (using the syslog input plug-in)
collectd
Both of these are set up on the host to forward to ports bound on the loopback interface, which Docker has mapped to the ELK-stack container through to logstash. As such, the only machine that is able to connect is the host running the ELK-stack container. Everything else talks via the rsyslog or collectd instance on the host.
That's why I was a little confused when the host field on rsyslog's messages alternated between a hostname and IP address. In any case, the above works around the issue.
It might be helpful to include in the input plug-in documentation what the typical output fields are, and perhaps in places such as when "%{foo} syntax" is mentioned, to see the input and filter plug-in documentation for hints on such fields.
It might be helpful to include in the input plug-in documentation what the typical output fields are, ...
I agree; the input plugins rarely specify which fields they emit and force you to find out for your self during testing. This is clearly an area of improvement.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.