Would it make sense for filebeat to expose the filename that its reading from in order to determine the index name( or kafka topic for that matter).
For example,
if its currently tailing from a file "/var/log/docker/api.log" then,
include a field in the event{"event_source" : "api" } or even {"event_source" : "api.log" }along with other metadata?
I know that filebeat already exposes an absolute path in the source field, but that is not enough for determining which index / kafka topic to write to if you're doing this at scale. Also, adding a field in the event is going to require change in the way the events are logged which makes the transition to filebeat much more difficult.
I totally see the worth in exposing the full path.
As more and more large scale organizations start to consider beats as their option for log-tailing, as seen in few other questions on Stackoverflow as well as the Elastic forum, this feature is going to be something that could be really helpful to add.
Additional overhead of maintaining logstash for doing simple extractions/inductions based on either fields in events or path is something that I feel will hamper the adoption of filebeat (or even worse, could potentially lead to adopters maintaining their own versions of filebeat) when used at scale.
One workaround would be to make use of prospector fields.
e.g. adds create a prospector per file type and set document_type accordingly or use ```
filebeat.prospectors.X.fields:
source_type: "api"
then you can use %{[fields.source_type]}.
Using indices or topics for kafka one can use conditionals todo some more processing.
But your request makes me think about introducing some kind of template-processors/functions as supported by more common templating systems. This could look somewhat like %{[source:basename]} or %{[source]|basename}. The former only on fields extracted from events, the second potentially on other value sources. I kind of like the pipe-symbol here. Imagine %{[source]|basename|trimRight('.log')}. Well, just some idea so far. Will have to think more about this.
I was originally thinking to extract index/topic name using the basename as an original request.
However, what @steffens mentioned, i can totally see the value of providing a scripting interface to existing metadata.
I'd be happy to contribute on that feature should you feel that we need to add that to filebeat or need to discuss potential use-cases that this request may suffice.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.