I'm setting up a logstash pipeline using Amazon's SQS as the message handler. The flow looks like this:
servers => receiver tier => SQS => filter tier => elasticsearch
Logstash forwarder sends logs to a receiver tier using the lumberjack input which forward logs unprocessed to SQS. A filter tier receives logs using the SQS input, processes the logs, and sends to an elasticsearch cluster.
All logstash machines are using Oracle Java 1.8.0_51 and logstash 1.5.3 installed from the official apt (debian) repository.
The problem I'm seeing is that the receiver machines seem to occasionally send messages to SQS which cause an error when retrieved by the filter machines, terminating the logstash process.
Problem message:
"LogStash::Inputs::Lumberjack: {\"port\"=>5000, \"type\"=>\"scale\", \"ssl_certificate\"=>\"/etc/ssl/logstash-lumberjack.crt\", \"ssl_key\"=>\"/etc/ssl/logstash-lumberjack.key\", \"debug\"=>false, \"codec\"=><LogStash::Codecs::Plain charset=>\"UTF-8\">, \"add_field\"=>{}, \"host\"=>\"0.0.0.0\", \"max_clients\"=>1000}"
Exception upon receiving this message:
{:timestamp=>"2015-08-07T01:43:29.312000+0000", :message=>"Error reading SQS queue.", :error=>#<IndexError: string not matched>, :queue=>"logstash-scale", :level=>:error}
At this point, the logstash process on the filter node terminates, and the message times out and becomes visible on the queue again. I'm using IAM roles to allow access to the queue from the EC2 machines.
Receiver node config:
input {
# receive events from logstash-forwarder (lumberjack protocol)
lumberjack {
port => 5000
type => "scale"
ssl_certificate => "/etc/ssl/logstash-lumberjack.crt"
ssl_key => "/etc/ssl/logstash-lumberjack.key"
}
}
output {
sqs {
queue => "logstash-scale"
region => "us-west-2"
}
}
Filter node config:
input {
sqs {
queue => "logstash-scale"
region => "us-west-2"
}
}
filter {
# nothing
}
output {
elasticsearch {
protocol => "http"
host => [ "internal-elasticsearch-logging-517786162.us-west-2.elb.amazonaws.com" ]
port => 9200
}
}