Multiple logstash servers need to be ache

Hi,
can i use multiple logstash servers in filebeat.My requirement is, if one logstash server gets down another server needs to be take responsibilty with out loss and duplicate of logs data.
will it be done with filebeat?

Thanks in advance

filebeat supports loadbalancing and/or failover. But it can not guarantee no duplicates, as filebeat has send-at-least-once semantics. Guaranteeing no duplication would require some non-trivial coordination on protocol level and between logstash instances. Or document IDs used to index-or-update documents in ES.

Thanks Steffens for your reply,

can you please provide a sample filebeat configuration code with multiple logstash servers.

See docs: https://www.elastic.co/guide/en/beats/filebeat/current/load-balancing.html

Hi steffens,
I have been working on Logstash with kafka.I am seeing different behaviour of Logstash while pushing messages into kafka.
Requirement:

  1. I will get different files continously in a location assume /tmp/input-logs/ . all are with extension .log.I have to push the messages of that files to a topic(stagin-topic) in kafka broker lets say 10.0.24.33:9092.Later kafka consmer will consumes the messages from topic and display.

  2. Some times i will restart logstash, then i want to get latest data without duplicate and loss.
    Code I have written for logstash
    input
    {
    file {
    path => "/file0/file1/logstash-input-logs/*.log"
    start_position => "end"
    sincedb_path => "/file0/file1/logstash-conf/input.sincedb"
    }
    }
    output {
    kafka {
    codec => plain {
    format => "%{message}"
    }
    bootstrap_servers => "10.0.24.23:9092"
    topic_id => "staging-topic"
    }
    }
    Problems I got

  3. If i put start_postion => "beginning".It is working fine untill we stop logstash while it is writing data to kafka.When we start logstash again then it
    reads data from all files.

  4. If i put start_postion => "end". Logstash not writes complete data to kafka ex: I have 100000 records in 20 files, but while consuming from zookeeper topic i can
    able to get only 40000 or 25000 and sometimes it is 60000.

  5. When i increased pipeline workers count from default 8 to 30.I am able to get all messages but if we restart logstash it is ignoring the messages which are not sent
    kafka.

    please suggest for best solution to my requirement.
    Thanks

regarding logstash+kafka issues, please ask in logstash forum.

  • kafka is offset based. It operates more like a big distributed append-only file. Using kafka one has to use and keep track of the offset in the consumer. Normally this is done using consumer groups. Using beginning (always use offset 0 on start) or end (always start at end of file - like tail -f) as start position, ignores the last offset processed.
  • to not loose data one has to properly manage offsets in consumer
  • kafka and kafka protocol is very minimal. It's up to the application deciding wether to retry inserting data or drop data. filebeat for example will retry forever. Not sure about logstash output plugin. With always-retry one only gets send-at-least-once semantics => In case of network failures there is always a chance of duplicates. One can try to get some sort of deduplication by defining unique keys per event.

This topic was automatically closed after 21 days. New replies are no longer allowed.