Filebeat ship logs verbatim

I'm configuring filebeat 5.1.1 to output to kafka. I would like filebeat to output to the kafka topic the lines in the log file verbatim.
Or move these fields under the > beat field?

This would allow us to easily migrate to using filebeat since the consumers of events from kafka need not update their logic to read from under a json field

See kafka topic and topics settings. The topic setting uses format strings potentially accessing any field in the event.

It's not clear to me how you logs are structured and how exactly you want to filter. But more advanced processing and event-routine requires logstash.

Thanks @steffens for your reply. Apologies for not being clear in explaining my question.

Say for example, my log msg line looks like :
{"message": "hello world"}

Rather than having a response like :

{
"@timestamp": "2016-12-15T19:03:53.232Z",
"beat": {
"hostname": "surajs-host",
"name": "surajs-host",
"version": "5.1.1"
},
"input_type": "stdin",
"message": "hello world",
"offset": 0,
"source": "test.log",
"type": "json"
}

We'd want to see something like

{
"beat": {
"@timestamp": "2016-12-15T19:03:53.232Z",
"hostname": "surajs-host",
"name": "surajs-host",
"version": "5.1.1"
"input_type": "stdin",
"offset": 0,
"source": "test.log",
"type": "json"
},
"message": "hello world",
}

This will enable the consumer to know that all fields under the beats field are added by filebeat and everything at the top level is what is from the original message.

This is currently not possible. To change the format of the output you need logstash or ingest node. Be aware that changing the format of the events has the consequence that they stop working with the index template of elasticsearch.

Our current deployment looks like :

filebeat -> Kafka -> LS -> ES
                |--> Applications

Looks like what you're suggesting will lead us to have a sandwich LS architecture

filebeat -> LS -> Kafka -> LS -> ES

Do you think it would make sense to have a flag that allows a user to specify the field under which to put all the beats associated metadata ?

As @ruflin said, this is currently not possible. Most metadata like source, input_type, ofsset, .... are expected to be part of the actual event. All beats related metadata are already in beat-namespace.

Is the event structure really such a big issue, justifying a more complicate system architecture?

@steffens

I can see how the information can be muddled if directly put under the beats namespace, but I don't see any harm of having them under beats.filebeat namespace. or have a filebeat namespace at the top level.

We have a large number of consumers consuming data from kafka and my worry is if the meta-data is included at the top level, it will be impossible to distinguish it from actual event-data compared to beats data.

If you look at metricbeat, you see that we currently kind of follow the opposite approach. Everything added by the beat is namespaced under module / metricset to prevent namespace conflicts. So your best bet at the moment is probably either to update your consumers to read from json or use Logstash.

Sorry for the delayed reply.

Thanks for the input. Just wanted to clarify my understanding about your previous suggestion. So you're saying that adding a filebeat namespace at the top level is not the right approach?

So an example msg like this would be different from other beat output semantics :

{
"beat": {
"@timestamp": "2016-12-15T19:03:53.232Z",
"hostname": "surajs-host",
"name": "surajs-host",
"version": "5.1.1"
},
"filebeat": {
"input_type": "stdin",
"offset": 0,
"source": "test.log",
"type": "json"
},
"message": "hello world",
}

@surajs I definitively don't want to say it is not the right approach. It's more two different ways of doing it and I see good reasons for both.

Here is an example event from metricbeat: https://github.com/elastic/beats/blob/master/metricbeat/module/apache/status/_meta/data.json It resembles the above in that some info is under metricset, but this info is rather static. Infos that constantly change are under the apache.status namespace. BUT this is not really comparable to filebeat as there is only message, so message could be the namespace here. It becomse more interesting in the case of json and where the json data is written to.

Summary: It is a really tricky question with lots of different options. Obviously as we have picked one model it is harder for us to change as it breaks BC. But the nice thing with LS and Ingest is that you can build your own logic to change the event to the format you need it.

Thanks @ruflin for the detailed answer. Will look at how we can adopt our exisiting architecture and data flow pipeline to fit our use-case.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.