Is it possible to specify a partition key when using filebeat to send to a Kafka topic?


#1

I found a topic one years ago, it is not possible at that time. Can we do it now with filebeat 5.5.1? I want some part of the message to be the key of the kafka message. For example, the log message is like:
2017:07:21 ip:12.34.56.78 xxxxxxxxxxxxxxxxx
I want to set a regular expression to extract the ip as the kafka message key. Is it possible with the latest filebeat?

Thanks.


(Steffen Siering) #2

Yes, see key and partition settings. Note, using key is optional when using the hash partitioner. You can define a list events fields that should be used to compute the hash. Difference is, when using key, the key will be used as event key in kafka as well (it must be unique).


#3

i read that document. it only has 2 lines and I don't know how to setup. It just say
"key
Optional Kafka event key. If configured, the event key must be unique and can be extracted from the event using a format string."
Is there any detailed document how to config it? (extract part of the message as the key or example)

Thanks.


(Steffen Siering) #4

check out the kafka sample config in the full filebeat reference config file: https://github.com/elastic/beats/blob/master/libbeat/_meta/config.reference.yml#L283

Format string support is explained here: https://www.elastic.co/guide/en/beats/libbeat/current/config-file-format-type.html#_format_string_sprintf

The key field only sets the events key, it does not select the partition. You still have to enable the hash partitioner, if you want to partition based on the events key.

Is it the event key you want to set, or do you want to select the partition the event is published to? Or both?

e.g. assuming my event has a field with unique name id and I want to use the key for hash based partitioning:

output.kafka:
  key: '%{[id]}
  partition.hash:
    hash: []    # will use the events 'key'
    reachable_only: false

#5

I am not sure how to get the id. here is my setup:

- input_type: log
  paths:
    - /logs/myservice.log
  include_lines: ["End request ip:(?P<ipaddress>[^,]*),"]
output.kafka:
  key: '%{[ipaddress]}'

and my log line is like
2017-07-31 11:51:51,766 INFO - End request ip:32.40.65.43,......
I can get the log line in kafka, but the key is still empty.
Is it possible at all?
thanks.


(Steffen Siering) #6

You can not extract fields in include_lines. Filebeat is not parsing any log-content. This is normally done by logstash or elasticsearch ingest pipeline.
Filebeat only supports json parsing. If your application log would be structured and in json format, you could access any fields.


#7

thanks.


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.