Logstash 2.2.0 hangs at "Logstash startup completed" while attempting to ingest CSV

MattW224 · February 12, 2016, 4:28pm

I installed the ELK stack from Elastic's repositories, and am encountering problems while attempting to ingest a 1 GB CSV file on an Amazon Linux (AWS) instance. This is my first attempt at ELK, so there's probably something simple I'm missing.

I've exceeded the character count, and will post the output of:

./logstash -f /etc/logstash/conf.d/10-aws.conf -v --verbose --debug

in the following message.

No new messages are written to the /var/log/logstash directory. With the debug and verbose logging options, shouldn't I be seeing (at a minimum) processed lines from the CSV in the console output? I wondered if it's an incompatibility with Amazon Linux or OpenJDK, or my instance (t2.micro) is too small. But I imagine I would receive error message for incompatibilities, or see extremely slow processing if that were the case.

Attempted resolutions

chmod 777 the CSV file, to ensure it's readable.
Explicitly specify the CSV file in the Logstash configuration.
Adding sincedb_path => "/dev/null" to the file filter, in order to force reparsing.
Verified that no indexes were created in ElasticSearch.
Deleted .sincedb files in /root. One contained 0 0 0 0. The other (more recent) was empty.

Setup
ElasticSearch 2.2.0
Kibana 4.4.0
Logstash 2.2.0

All were installed from Elastic's repositories. The only change I made was to set chkconfig on.

java version "1.7.0_95"
OpenJDK Runtime Environment (amzn-2.6.4.0.65.amzn1-x86_64 u95-b00)
OpenJDK 64-Bit Server VM (build 24.95-b01, mixed mode)

Amazon Linux AMI release 2015.09 on t2.micro instance

Logstash script

input {
	file {
		path => "/tmp/elk_data/aws_invoices/aws-billing-detailed-2015-01.csv"
		
		# If CSV modified, reload from beginning. Default setting is
		# "end", and will index new lines appended to the file.
		start_position => "beginning"
		
		# Force reparse.
		sincedb_path => "/dev/null"
	}
}

filter {
	csv {
		# Include all columns and drop later as necessary.
		columns => [
			"InvoiceID", "PayerAccountID", "LinkedAccountID",
			"RecordType", "RecordID", "ProductName",
			"RateID", "SubscriptionID", "PricingPlanID",
			"UsageType", "Operation", "AvailabilityZone",
			"ReservedInstance",	"ItemDescription", "UsageStartDate",
			"UsageEndDate",	"UsageQuantity", "BlendedRate",
			"BlendedCost", "UnblendedRate","UnblendedCost",
			"ResourceID", "AutoscalingGroupName", "cfLogicalID",
			"cfStackID", "cfStackName", "tagEnvironment",
			"tagName", "tagOwner"
		]
		
		# Smart enough to detect text qualifier?
		separator => ","
	}
		
	drop {
		remove_field => ["AutoscalingGroupName", "cfLogicalID",
		"cfStackID", "cfStackName", "tagEnvironment",
		"tagName", "tagOwner"]
	}
	
	# Drop header, and rows containing invoice totals.
	if [InvoiceID] == "InvoiceID" or "Total" in [RecordType] {
		drop { }
	}

	ruby {
		code => "
			event['UsageStartDate'] = Date.parse(event['UsageStartDate']);
			event['UsageEndDate'] = Date.parse(event['UsageEndDate']);
		"
	}

	mutate {
		convert => {
			# Convert decimal values from string to float.
			"UsageQuantity" => "float"
			"BlendedRate" => "float"
			"BlendedCost" => "float"
			"UnblendedRate" => "float"
			"UnblendedCost" => "float"
		}
	}
	
}

output {
	elasticsearch {
		hosts => ["localhost:9200"]
		action => "index"
	}
	stdout {
		codec => rubydebug
	}
}

MattW224 · February 12, 2016, 4:29pm

When I run:

./logstash -f /etc/logstash/conf.d/10-aws.conf -v --verbose --debug

I receive:

Using mapping template from {:path=>nil, :level=>:info}

Attempting to install template {:manage_template=>{"template"=>"logstash-*", "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"_all"=>{"enabled"=>true, "omit_norms"=>true}, "dynamic_templates"=>[{"message_field"=>{"match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"string", "index"=>"analyzed", "omit_norms"=>true, "fielddata"=>{"format"=>"disabled"}}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"string", "index"=>"analyzed", "omit_norms"=>true, "fielddata"=>{"format"=>"disabled"}, "fields"=>{"raw"=>{"type"=>"string", "index"=>"not_analyzed", "doc_values"=>true, "ignore_above"=>256}}}}}, {"float_fields"=>{"match"=>"*", "match_mapping_type"=>"float", "mapping"=>{"type"=>"float", "doc_values"=>true}}}, {"double_fields"=>{"match"=>"*", "match_mapping_type"=>"double", "mapping"=>{"type"=>"double", "doc_values"=>true}}}, {"byte_fields"=>{"match"=>"*", "match_mapping_type"=>"byte", "mapping"=>{"type"=>"byte", "doc_values"=>true}}}, {"short_fields"=>{"match"=>"*", "match_mapping_type"=>"short", "mapping"=>{"type"=>"short", "doc_values"=>true}}}, {"integer_fields"=>{"match"=>"*", "match_mapping_type"=>"integer", "mapping"=>{"type"=>"integer", "doc_values"=>true}}}, {"long_fields"=>{"match"=>"*", "match_mapping_type"=>"long", "mapping"=>{"type"=>"long", "doc_values"=>true}}}, {"date_fields"=>{"match"=>"*", "match_mapping_type"=>"date", "mapping"=>{"type"=>"date", "doc_values"=>true}}}, {"geo_point_fields"=>{"match"=>"*", "match_mapping_type"=>"geo_point", "mapping"=>{"type"=>"geo_point", "doc_values"=>true}}}], "properties"=>{"@timestamp"=>{"type"=>"date", "doc_values"=>true}, "@version"=>{"type"=>"string", "index"=>"not_analyzed", "doc_values"=>true}, "geoip"=>{"type"=>"object", "dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip", "doc_values"=>true}, "location"=>{"type"=>"geo_point", "doc_values"=>true}, "latitude"=>{"type"=>"float", "doc_values"=>true}, "longitude"=>{"type"=>"float", "doc_values"=>true}}}}}}}, :level=>:info}

New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["localhost:9200"], :level=>:info}

Settings: Default pipeline workers: 1

Registering file input {:path=>["/tmp/elk_data/aws_invoices/aws-billing-detailed-2015-01.csv"], :level=>:info}

Using mapping template from {:path=>nil, :level=>:info}

Attempting to install template {:manage_template=>{"template"=>"logstash-*", "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"_all"=>{"enabled"=>true, "omit_norms"=>true}, "dynamic_templates"=>[{"message_field"=>{"match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"string", "index"=>"analyzed", "omit_norms"=>true, "fielddata"=>{"format"=>"disabled"}}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"string", "index"=>"analyzed", "omit_norms"=>true, "fielddata"=>{"format"=>"disabled"}, "fields"=>{"raw"=>{"type"=>"string", "index"=>"not_analyzed", "doc_values"=>true, "ignore_above"=>256}}}}}, {"float_fields"=>{"match"=>"*", "match_mapping_type"=>"float", "mapping"=>{"type"=>"float", "doc_values"=>true}}}, {"double_fields"=>{"match"=>"*", "match_mapping_type"=>"double", "mapping"=>{"type"=>"double", "doc_values"=>true}}}, {"byte_fields"=>{"match"=>"*", "match_mapping_type"=>"byte", "mapping"=>{"type"=>"byte", "doc_values"=>true}}}, {"short_fields"=>{"match"=>"*", "match_mapping_type"=>"short", "mapping"=>{"type"=>"short", "doc_values"=>true}}}, {"integer_fields"=>{"match"=>"*", "match_mapping_type"=>"integer", "mapping"=>{"type"=>"integer", "doc_values"=>true}}}, {"long_fields"=>{"match"=>"*", "match_mapping_type"=>"long", "mapping"=>{"type"=>"long", "doc_values"=>true}}}, {"date_fields"=>{"match"=>"*", "match_mapping_type"=>"date", "mapping"=>{"type"=>"date", "doc_values"=>true}}}, {"geo_point_fields"=>{"match"=>"*", "match_mapping_type"=>"geo_point", "mapping"=>{"type"=>"geo_point", "doc_values"=>true}}}], "properties"=>{"@timestamp"=>{"type"=>"date", "doc_values"=>true}, "@version"=>{"type"=>"string", "index"=>"not_analyzed", "doc_values"=>true}, "geoip"=>{"type"=>"object", "dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip", "doc_values"=>true}, "location"=>{"type"=>"geo_point", "doc_values"=>true}, "latitude"=>{"type"=>"float", "doc_values"=>true}, "longitude"=>{"type"=>"float", "doc_values"=>true}}}}}}}, :level=>:info}

New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["localhost:9200"], :level=>:info}

Starting pipeline {:id=>"base", :pipeline_workers=>1, :batch_size=>125, :batch_delay=>5, :max_inflight=>125, :level=>:info}

Pipeline started {:level=>:info}

Logstash startup completed

MattW224 · February 12, 2016, 10:53pm

Although I'm uncertain of the root cause, I re-provisioned with a RHEL 7 instance, installed the latest Oracle JDK, and Logstash is functional now.

zhashuyu · March 29, 2016, 1:31am

I encounter the similar problem when import plenty of csv files ( 3230 files ) with ELK stack, the logstash almost miss 2500 files.

Environment:
OS: centos 7.2
Elasticsearch: 2.2.1
Kibana: 4.4.1
Logstash: 2.2.2

At last, I resolve it by switching logstash back to logstash 2.1.1.

Topic		Replies	Views
Logstash hangs when uploading csv file Logstash	3	1090	July 6, 2017
Unable to log CSV Logstash	3	390	November 6, 2018
Logstash is not reading CSV file Logstash	4	885	July 6, 2017
Issues with Logstash reading csv file Logstash	3	372	November 16, 2021
Problem ingesting data with Logstash Logstash	2	264	May 17, 2018

Logstash 2.2.0 hangs at "Logstash startup completed" while attempting to ingest CSV

Related topics