Errors with Logstash bulk load using CSV

cburkins · September 10, 2015, 6:36pm

I'm using Logstash to bulk load into ElasticSearch using a CSV template

I'm getting lots of warnings like:

retrying failed action with response code: 429 {:level=>:warn}

And lots of errors like:

too many attempts at sending event. dropping: 2015-06-16T01:06:22.000Z ip-172-31-30-197 93522,itsmxjunbu01,0,Backup,chile.eesus.jnj.com,"Jun 15, 2015  {:level=>:error} 16, 2015 1:06:22 AM",03:05:28,32,0,3

My command to load the data looks like this:

cat data.csv | /opt/logstash/bin/logstash -f ./csvload.conf

The csvload.conf looks like this:

# read input from stdin (e.g. pipe)
input {
    stdin {}
}

filter {
   # filter the input by csv (i.e. comma-separated-value)
   csv {
       columns => [
           "JobID",
           "ServerName",
           "StatusCode",
           "JobType",
           "ClientName",
           "StartTime",
           "EndTime",
           "Duration",
           "Volume-KB",
           "NumberofFiles",
           "Throughput-KB-sec"
       ]
   }

    date {
        # parse the "End Time" to create a real date
        # Examples of times in this log file
        # "May 29, 2015 10:00:01 PM"
        # "May 9, 2015 4:47:23 AM"
        # "May 23, 2015 12:23:49 PM"
        match => [ "EndTime",
                   "MMM dd, YYYY hh:mm:ss aa",
                   "MMM  d, YYYY hh:mm:ss aa" ] }

    mutate { replace => { "type" => "nbu_job" } }
    mutate { gsub => ["NumberofFiles", ",", ""] }
    mutate { convert => [ "NumberofFiles", "integer" ] }
    mutate { gsub => ["Volume-KB", ",", ""] }
    mutate { convert => [ "Volume-KB", "integer" ] }
    mutate { gsub => ["Throughput-KB-sec", ",", ""] }
    mutate { convert => [ "Throughput-KB-sec", "integer" ] }

    # Example of Duration = "04:28:13" which is hours, minutes, and seconds
    # Split up and create the respective integer fields
    grok {
        match => [ "Duration", "%{NUMBER:hours:int}:%{NUMBER:minutes:int}:%{NUMBER:seconds:int}" ]
    }

    # Call ruby to perform the basic arithmetic of computing total seconds
    ruby {
        code => "event['Elapsed'] = event['hours']*3600 + event['minutes']*60 + event['seconds']"
    }


    translate {
        field => "ServerName"
        destination => "Country"
        dictionary_path => "./cmdb/ServerByCountry.yaml"
        fallback => "Unknown"
    }
}

# send the output to stdout, using the rubydebug codec
# rubydedug uses the Ruby Awesome Print library
output {
#    stdout { codec => rubydebug }
    elasticsearch { host => localhost   }
}

I honestly really don't care about the csv import performance. I wish I could figure out a way to slow down the load. Seems like logstash and/or elasticsearch is choking....

-Chad

warkolm · September 11, 2015, 6:57am

How big is the file?
What sort of specs is the ES host?
Have you checked things like threadpool rejections?

cburkins · September 11, 2015, 9:50am

Thanks for the reply, Mark.

I've got five different input files, all with similar data, ranging in size from 600K lines to 1.7M lines. Strangely, the input file with 1.7M lines ran just fine, but the "smaller" file with 600K lines produces the warnings and errors.

The ES host is an AWS t2.medium (2 vCPU's and 4GB of RAM)

I'm still new to this (but enjoying it very much), so I'm not sure what you mean by threadpool rejections. I'm guessing "threadpool" is a resource define within elasticsearch.yml ?

-Chad

Christian_Dahlqvist · September 11, 2015, 9:58am

t2 instances are burstable performance instances and may not be ideal for long sustained bulk loading. In order to avoid this limitation you could perform the bulk load locally or on a separate, more powerful instance, and then use snapshot and restore, possibly from S3 using the AWS plugin to move the indexed data to your instance.

Topic		Replies	Views
Bulk load via action.bulk API dropping events when overloaded? Elasticsearch	6	1497	July 6, 2017
Best method - Importing 50x10gb CSV files into Elasticsearch on GCE Elasticsearch	6	8943	July 6, 2017
An error occurs when loading loads with " logstash " Logstash	8	1163	October 30, 2017
Logstash stops loading text file into ES Logstash	2	645	July 6, 2017
Error on logstash during elasticsearch insert Logstash	2	1620	January 9, 2017

Errors with Logstash bulk load using CSV

Related topics