Filebeat ship logs verbatim

surajs · December 15, 2016, 1:40am

I'm configuring filebeat 5.1.1 to output to kafka. I would like filebeat to output to the kafka topic the lines in the log file verbatim.
Or move these fields under the > beat field?

This would allow us to easily migrate to using filebeat since the consumers of events from kafka need not update their logic to read from under a json field

steffens · December 15, 2016, 1:33pm

See kafka topic and topics settings. The topic setting uses format strings potentially accessing any field in the event.

It's not clear to me how you logs are structured and how exactly you want to filter. But more advanced processing and event-routine requires logstash.

surajs · December 15, 2016, 7:09pm

Thanks @steffens for your reply. Apologies for not being clear in explaining my question.

Say for example, my log msg line looks like :
{"message": "hello world"}

Rather than having a response like :

{
"@timestamp": "2016-12-15T19:03:53.232Z",
"beat": {
"hostname": "surajs-host",
"name": "surajs-host",
"version": "5.1.1"
},
"input_type": "stdin",
"message": "hello world",
"offset": 0,
"source": "test.log",
"type": "json"
}

We'd want to see something like

{
"beat": {
"@timestamp": "2016-12-15T19:03:53.232Z",
"hostname": "surajs-host",
"name": "surajs-host",
"version": "5.1.1"
"input_type": "stdin",
"offset": 0,
"source": "test.log",
"type": "json"
},
"message": "hello world",
}

This will enable the consumer to know that all fields under the beats field are added by filebeat and everything at the top level is what is from the original message.

ruflin · December 16, 2016, 7:45am

This is currently not possible. To change the format of the output you need logstash or ingest node. Be aware that changing the format of the events has the consequence that they stop working with the index template of elasticsearch.

surajs · December 16, 2016, 4:17pm

Our current deployment looks like :

filebeat -> Kafka -> LS -> ES
                |--> Applications

Looks like what you're suggesting will lead us to have a sandwich LS architecture

filebeat -> LS -> Kafka -> LS -> ES

Do you think it would make sense to have a flag that allows a user to specify the field under which to put all the beats associated metadata ?

steffens · December 19, 2016, 1:24pm

As @ruflin said, this is currently not possible. Most metadata like source, input_type, ofsset, .... are expected to be part of the actual event. All beats related metadata are already in beat-namespace.

Is the event structure really such a big issue, justifying a more complicate system architecture?

surajs · December 21, 2016, 6:28pm

@steffens

I can see how the information can be muddled if directly put under the beats namespace, but I don't see any harm of having them under beats.filebeat namespace. or have a filebeat namespace at the top level.

We have a large number of consumers consuming data from kafka and my worry is if the meta-data is included at the top level, it will be impossible to distinguish it from actual event-data compared to beats data.

ruflin · December 23, 2016, 8:30am

If you look at metricbeat, you see that we currently kind of follow the opposite approach. Everything added by the beat is namespaced under module / metricset to prevent namespace conflicts. So your best bet at the moment is probably either to update your consumers to read from json or use Logstash.

surajs · January 4, 2017, 10:23pm

Sorry for the delayed reply.

Thanks for the input. Just wanted to clarify my understanding about your previous suggestion. So you're saying that adding a filebeat namespace at the top level is not the right approach?

So an example msg like this would be different from other beat output semantics :

{
"beat": {
"@timestamp": "2016-12-15T19:03:53.232Z",
"hostname": "surajs-host",
"name": "surajs-host",
"version": "5.1.1"
},
"filebeat": {
"input_type": "stdin",
"offset": 0,
"source": "test.log",
"type": "json"
},
"message": "hello world",
}

ruflin · January 5, 2017, 1:49pm

@surajs I definitively don't want to say it is not the right approach. It's more two different ways of doing it and I see good reasons for both.

Here is an example event from metricbeat: https://github.com/elastic/beats/blob/master/metricbeat/module/apache/status/_meta/data.json It resembles the above in that some info is under metricset, but this info is rather static. Infos that constantly change are under the apache.status namespace. BUT this is not really comparable to filebeat as there is only message, so message could be the namespace here. It becomse more interesting in the case of json and where the json data is written to.

Summary: It is a really tricky question with lots of different options. Obviously as we have picked one model it is harder for us to change as it breaks BC. But the nice thing with LS and Ingest is that you can build your own logic to change the event to the format you need it.

surajs · January 5, 2017, 6:53pm

Thanks @ruflin for the detailed answer. Will look at how we can adopt our exisiting architecture and data flow pipeline to fit our use-case.

system · February 2, 2017, 6:53pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can we set kafka output topic dynamically in filebeat? Beats	7	1347	August 8, 2018
Filebeat sends json to kafka instead of raw content of log file Beats filebeat	4	1801	January 13, 2017
Can I only reserve fields parsed from log when output? Beats filebeat	1	301	September 20, 2019
Formatting output of Filebeat Kafka plugin Beats filebeat	2	2037	February 20, 2017
Filebeat output kafka (processors) Beats filebeat	5	396	April 5, 2022

Filebeat ship logs verbatim

Related topics