Interesting logstash behavior

Marco_Spadoni · September 26, 2017, 8:37am

I am running version 5.5.2 stack of logstash and elasticsearch in a hot-warm architecture. To free heap space on elastic running on warm nodes, oldest indices are closed.
The logstash pipeline reads messages from Kafka, filter them, and output to elastic.
Usually this happens toward hot nodes (SSD equipped) which contain the indices related to the last 3 days.
A problem on the rsyslog feeding Kafka caused ten-days old messages to be feed again in Kafka.
The logstash elasticsearch output plugin then tried to index them in an index yet closed.
What happens is the following:
logstash correctly signals in its logfile INFO like this one:
[2017-09-26T09:15:36,026][INFO ][logstash.outputs.elasticsearch][am-ops-1] retrying failed action with response code: 403 ({"type"=>"index_closed_exception", "reason"=>"closed", "index_uuid"=>"roxHCI09ToWHtx6gyXLEKA", "index"=>"ox_as_fe_ops-2017.09.18"})
after we have also:
[2017-09-26T09:15:36,026][INFO ][logstash.outputs.elasticsearch][am-ops-1] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>1}
and after some time:
[2017-09-26T09:16:57,047][INFO ][logstash.outputs.elasticsearch][am-ops-1] Retrying individual bulk actions that failed or were rejected by the previous bulk request. {:count=>5}
After some time it seems that the pipeline stop to consume from Kafka. In fact the Kafka server.log file contains:
INFO [GroupCoordinator 1]: Group OX-AM-OPS-cg-ss with generation 3 is now empty (kafka.coordinator.GroupCoordinator)
and
INFO [Group Metadata Manager on Broker 1]: Group OX-AM-OPS-cg-ss transitioned to Dead in generation 3 (kafka.coordinator.GroupMetadataManager).
This behavior is easily replicated on our installation.
We overcome the problem by inserting this piece of code:
ruby {
code => "event.cancel if (Time.now.to_f - event.get('@timestamp').to_f) > (60 * 60 * 24 * 5)"
}
as suggested in: https://stackoverflow.com/questions/30087807/ignore-incoming-logstash-entries-that-are-older-than-a-given-date
Nevertheless I think that this kind of event should not lead to pipeline hanging in consuming.
Thanks for your attention,
Marco

Christian_Dahlqvist · September 26, 2017, 8:51am

It may be possible to use a dead letter queue to capture these events, but I am not sure exactly which return codes that are captured by default.

system · October 24, 2017, 8:52am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash polling for closed index and stopping the log forwarding Logstash	2	534	March 2, 2018
Logstash writing to older index Logstash	5	26	October 30, 2024
Old logs? Logstash	7	2452	January 23, 2018
Logstash: Output fails Logstash	8	1151	September 7, 2018
Logstash is failing with response code 403 (index_closed_exception) Logstash	9	10408	January 25, 2018

Interesting logstash behavior

Related topics