I'm configuring filebeat 5.1.1 to output to kafka. I would like filebeat to output to the kafka topic the lines in the log file verbatim.
Or move these fields under the > beat field?
This would allow us to easily migrate to using filebeat since the consumers of events from kafka need not update their logic to read from under a json field
This will enable the consumer to know that all fields under the beats field are added by filebeat and everything at the top level is what is from the original message.
This is currently not possible. To change the format of the output you need logstash or ingest node. Be aware that changing the format of the events has the consequence that they stop working with the index template of elasticsearch.
As @ruflin said, this is currently not possible. Most metadata like source, input_type, ofsset, .... are expected to be part of the actual event. All beats related metadata are already in beat-namespace.
Is the event structure really such a big issue, justifying a more complicate system architecture?
I can see how the information can be muddled if directly put under the beats namespace, but I don't see any harm of having them under beats.filebeat namespace. or have a filebeat namespace at the top level.
We have a large number of consumers consuming data from kafka and my worry is if the meta-data is included at the top level, it will be impossible to distinguish it from actual event-data compared to beats data.
If you look at metricbeat, you see that we currently kind of follow the opposite approach. Everything added by the beat is namespaced under module / metricset to prevent namespace conflicts. So your best bet at the moment is probably either to update your consumers to read from json or use Logstash.
Thanks for the input. Just wanted to clarify my understanding about your previous suggestion. So you're saying that adding a filebeat namespace at the top level is not the right approach?
So an example msg like this would be different from other beat output semantics :
@surajs I definitively don't want to say it is not the right approach. It's more two different ways of doing it and I see good reasons for both.
Here is an example event from metricbeat: https://github.com/elastic/beats/blob/master/metricbeat/module/apache/status/_meta/data.json It resembles the above in that some info is under metricset, but this info is rather static. Infos that constantly change are under the apache.status namespace. BUT this is not really comparable to filebeat as there is only message, so message could be the namespace here. It becomse more interesting in the case of json and where the json data is written to.
Summary: It is a really tricky question with lots of different options. Obviously as we have picked one model it is harder for us to change as it breaks BC. But the nice thing with LS and Ingest is that you can build your own logic to change the event to the format you need it.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.