Logstash 2.2.0 hangs at "Logstash startup completed" while attempting to ingest CSV

I installed the ELK stack from Elastic's repositories, and am encountering problems while attempting to ingest a 1 GB CSV file on an Amazon Linux (AWS) instance. This is my first attempt at ELK, so there's probably something simple I'm missing.

I've exceeded the character count, and will post the output of:

./logstash -f /etc/logstash/conf.d/10-aws.conf -v --verbose --debug

in the following message.

No new messages are written to the /var/log/logstash directory. With the debug and verbose logging options, shouldn't I be seeing (at a minimum) processed lines from the CSV in the console output? I wondered if it's an incompatibility with Amazon Linux or OpenJDK, or my instance (t2.micro) is too small. But I imagine I would receive error message for incompatibilities, or see extremely slow processing if that were the case.

Attempted resolutions

  • chmod 777 the CSV file, to ensure it's readable.
  • Explicitly specify the CSV file in the Logstash configuration.
  • Adding sincedb_path => "/dev/null" to the file filter, in order to force reparsing.
  • Verified that no indexes were created in ElasticSearch.
  • Deleted .sincedb files in /root. One contained 0 0 0 0. The other (more recent) was empty.

Setup
ElasticSearch 2.2.0
Kibana 4.4.0
Logstash 2.2.0

All were installed from Elastic's repositories. The only change I made was to set chkconfig on.

java version "1.7.0_95"
OpenJDK Runtime Environment (amzn-2.6.4.0.65.amzn1-x86_64 u95-b00)
OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)

Amazon Linux AMI release 2015.09 on t2.micro instance

Logstash script

input {
	file {
		path => "/tmp/elk_data/aws_invoices/aws-billing-detailed-2015-01.csv"
		
		# If CSV modified, reload from beginning. Default setting is
		# "end", and will index new lines appended to the file.
		start_position => "beginning"
		
		# Force reparse.
		sincedb_path => "/dev/null"
	}
}

filter {
	csv {
		# Include all columns and drop later as necessary.
		columns => [
			"InvoiceID", "PayerAccountID", "LinkedAccountID",
			"RecordType", "RecordID", "ProductName",
			"RateID", "SubscriptionID", "PricingPlanID",
			"UsageType", "Operation", "AvailabilityZone",
			"ReservedInstance",	"ItemDescription", "UsageStartDate",
			"UsageEndDate",	"UsageQuantity", "BlendedRate",
			"BlendedCost", "UnblendedRate","UnblendedCost",
			"ResourceID", "AutoscalingGroupName", "cfLogicalID",
			"cfStackID", "cfStackName", "tagEnvironment",
			"tagName", "tagOwner"
		]
		
		# Smart enough to detect text qualifier?
		separator => ","
	}
		
	drop {
		remove_field => ["AutoscalingGroupName", "cfLogicalID",
		"cfStackID", "cfStackName", "tagEnvironment",
		"tagName", "tagOwner"]
	}
	
	# Drop header, and rows containing invoice totals.
	if [InvoiceID] == "InvoiceID" or "Total" in [RecordType] {
		drop { }
	}

	ruby {
		code => "
			event['UsageStartDate'] = Date.parse(event['UsageStartDate']);
			event['UsageEndDate'] = Date.parse(event['UsageEndDate']);
		"
	}

	mutate {
		convert => {
			# Convert decimal values from string to float.
			"UsageQuantity" => "float"
			"BlendedRate" => "float"
			"BlendedCost" => "float"
			"UnblendedRate" => "float"
			"UnblendedCost" => "float"
		}
	}
	
}

output {
	elasticsearch {
		hosts => ["localhost:9200"]
		action => "index"
	}
	stdout {
		codec => rubydebug
	}
}

When I run:

./logstash -f /etc/logstash/conf.d/10-aws.conf -v --verbose --debug

I receive:

Using mapping template from {:path=>nil, :level=>:info}

Attempting to install template {:manage_template=>{"template"=>"logstash-*", "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"_all"=>{"enabled"=>true, "omit_norms"=>true}, "dynamic_templates"=>[{"message_field"=>{"match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"string", "index"=>"analyzed", "omit_norms"=>true, "fielddata"=>{"format"=>"disabled"}}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"string", "index"=>"analyzed", "omit_norms"=>true, "fielddata"=>{"format"=>"disabled"}, "fields"=>{"raw"=>{"type"=>"string", "index"=>"not_analyzed", "doc_values"=>true, "ignore_above"=>256}}}}}, {"float_fields"=>{"match"=>"*", "match_mapping_type"=>"float", "mapping"=>{"type"=>"float", "doc_values"=>true}}}, {"double_fields"=>{"match"=>"*", "match_mapping_type"=>"double", "mapping"=>{"type"=>"double", "doc_values"=>true}}}, {"byte_fields"=>{"match"=>"*", "match_mapping_type"=>"byte", "mapping"=>{"type"=>"byte", "doc_values"=>true}}}, {"short_fields"=>{"match"=>"*", "match_mapping_type"=>"short", "mapping"=>{"type"=>"short", "doc_values"=>true}}}, {"integer_fields"=>{"match"=>"*", "match_mapping_type"=>"integer", "mapping"=>{"type"=>"integer", "doc_values"=>true}}}, {"long_fields"=>{"match"=>"*", "match_mapping_type"=>"long", "mapping"=>{"type"=>"long", "doc_values"=>true}}}, {"date_fields"=>{"match"=>"*", "match_mapping_type"=>"date", "mapping"=>{"type"=>"date", "doc_values"=>true}}}, {"geo_point_fields"=>{"match"=>"*", "match_mapping_type"=>"geo_point", "mapping"=>{"type"=>"geo_point", "doc_values"=>true}}}], "properties"=>{"@timestamp"=>{"type"=>"date", "doc_values"=>true}, "@version"=>{"type"=>"string", "index"=>"not_analyzed", "doc_values"=>true}, "geoip"=>{"type"=>"object", "dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip", "doc_values"=>true}, "location"=>{"type"=>"geo_point", "doc_values"=>true}, "latitude"=>{"type"=>"float", "doc_values"=>true}, "longitude"=>{"type"=>"float", "doc_values"=>true}}}}}}}, :level=>:info}

New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["localhost:9200"], :level=>:info}

Settings: Default pipeline workers: 1

Registering file input {:path=>["/tmp/elk_data/aws_invoices/aws-billing-detailed-2015-01.csv"], :level=>:info}

Using mapping template from {:path=>nil, :level=>:info}

Attempting to install template {:manage_template=>{"template"=>"logstash-*", "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"_all"=>{"enabled"=>true, "omit_norms"=>true}, "dynamic_templates"=>[{"message_field"=>{"match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"string", "index"=>"analyzed", "omit_norms"=>true, "fielddata"=>{"format"=>"disabled"}}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"string", "index"=>"analyzed", "omit_norms"=>true, "fielddata"=>{"format"=>"disabled"}, "fields"=>{"raw"=>{"type"=>"string", "index"=>"not_analyzed", "doc_values"=>true, "ignore_above"=>256}}}}}, {"float_fields"=>{"match"=>"*", "match_mapping_type"=>"float", "mapping"=>{"type"=>"float", "doc_values"=>true}}}, {"double_fields"=>{"match"=>"*", "match_mapping_type"=>"double", "mapping"=>{"type"=>"double", "doc_values"=>true}}}, {"byte_fields"=>{"match"=>"*", "match_mapping_type"=>"byte", "mapping"=>{"type"=>"byte", "doc_values"=>true}}}, {"short_fields"=>{"match"=>"*", "match_mapping_type"=>"short", "mapping"=>{"type"=>"short", "doc_values"=>true}}}, {"integer_fields"=>{"match"=>"*", "match_mapping_type"=>"integer", "mapping"=>{"type"=>"integer", "doc_values"=>true}}}, {"long_fields"=>{"match"=>"*", "match_mapping_type"=>"long", "mapping"=>{"type"=>"long", "doc_values"=>true}}}, {"date_fields"=>{"match"=>"*", "match_mapping_type"=>"date", "mapping"=>{"type"=>"date", "doc_values"=>true}}}, {"geo_point_fields"=>{"match"=>"*", "match_mapping_type"=>"geo_point", "mapping"=>{"type"=>"geo_point", "doc_values"=>true}}}], "properties"=>{"@timestamp"=>{"type"=>"date", "doc_values"=>true}, "@version"=>{"type"=>"string", "index"=>"not_analyzed", "doc_values"=>true}, "geoip"=>{"type"=>"object", "dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip", "doc_values"=>true}, "location"=>{"type"=>"geo_point", "doc_values"=>true}, "latitude"=>{"type"=>"float", "doc_values"=>true}, "longitude"=>{"type"=>"float", "doc_values"=>true}}}}}}}, :level=>:info}

New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["localhost:9200"], :level=>:info}

Starting pipeline {:id=>"base", :pipeline_workers=>1, :batch_size=>125, :batch_delay=>5, :max_inflight=>125, :level=>:info}

Pipeline started {:level=>:info}

Logstash startup completed

Although I'm uncertain of the root cause, I re-provisioned with a RHEL 7 instance, installed the latest Oracle JDK, and Logstash is functional now.

I encounter the similar problem when import plenty of csv files ( 3230 files ) with ELK stack, the logstash almost miss 2500 files.

Environment:
OS: centos 7.2
Elasticsearch: 2.2.1
Kibana: 4.4.1
Logstash: 2.2.2

At last, I resolve it by switching logstash back to logstash 2.1.1.