So we have a setup that works pretty well: Data comes in through the beats plugin, and gets shipped to s3.
I have it setup so that it outputs the local files to different directories, so that they stay within their limits for sizes, times, and separated out. They are then set to upload the files to s3 to various buckets for processing (we partition it out so that over the weekend, our batch jobs do not overrun all of the other data, and they can queue up as necessary). Call it logical load balancing, if you will...
The issue is is that a few of the nodes just suddenly stop sending SOME of the files. We restart logstash and they all get uploaded, no sweat. But about 10-15 minutes later, they are stuck and not uploading. We ended up cron scheduling a service logstash restart to get the files pushed in a timely manner (once an hour).
Any thoughts on this? The Logstash config file for the nodes uploading the data to s3 is below.
Errors seen on the Logstash node include:
{:timestamp=>"2016-05-25T14:00:25.038000-0700", :message=>"S3: have found temporary file the upload process crashed, uploading file to S3.", :filename=>"ls.s3.phx7b02c-543d.stratus.phx.ebay.com.2016-05-25T13.18.part18.txt", :level=>:warn}
{:timestamp=>"2016-05-25T14:00:25.038000-0700", :message=>"S3: have found temporary file the upload process crashed, uploading file to S3.", :filename=>"ls.s3.phx7b02c-543d.stratus.phx.ebay.com.2016-05-25T13.41.part41.txt", :level=>:warn}
{:timestamp=>"2016-05-25T14:02:20.665000-0700", :message=>"S3: Cannot delete the temporary file since it doesn't exist on disk", :filename=>"ls.s3.phx7b02c-543d.stratus.phx.ebay.com.2016-05-25T14.01.part1.txt", :level=>:warn}
{:timestamp=>"2016-05-25T14:02:24.304000-0700", :message=>"S3: AWS error", :error=>#<AWS::S3::Errors::BadRequest: An error occurred when parsing the HTTP request.>, :level=>:error}
{:timestamp=>"2016-05-25T14:02:25.101000-0700", :message=>"S3: AWS error", :error=>#<AWS::S3::Errors::BadRequest: An error occurred when parsing the HTTP request.>, :level=>:error}
{:timestamp=>"2016-05-25T14:04:20.052000-0700", :message=>"S3: Cannot delete the temporary file since it doesn't exist on disk", :filename=>"ls.s3.phx7b02c-543d.stratus.phx.ebay.com.2016-05-25T14.03.part3.txt", :level=>:warn}
{:timestamp=>"2016-05-25T14:05:20.074000-0700", :message=>"S3: Cannot delete the temporary file since it doesn't exist on disk", :filename=>"ls.s3.phx7b02c-543d.stratus.phx.ebay.com.2016-05-25T14.04.part4.txt", :level=>:warn}
{:timestamp=>"2016-05-25T14:05:21.935000-0700", :message=>"S3: AWS error", :error=>#<AWS::S3::Errors::BadRequest: An error occurred when parsing the HTTP request.>, :level=>:error}
We have Logstash servers in AWS that are reading these files to then process and send out to ES and also another s3 archive bucket. No errors noted on them.
input {
beats {
type=> beats
port => 9990
codec => "json"
}
beats {
type=> beats
port => 9991
codec => "json"
}
} #End input
output {
if "ACC" in [opp]
{
s3 {
access_key_id => "redact"
secret_access_key => "redact"
region => "us-west-2"
bucket => "redact"
canned_acl => "authenticated_read"
size_file => 50000000
time_file => 1
upload_workers_count => 20
prefix => "1-phx7b02c-543d-acc/"
codec => "json_lines"
temporary_directory => "/data/logstash/forwarder-acc"
restore => true
}
} # End acc if
Final }
}