Handle Blank data in CSV - Ingest By Using Logstash


(ELK Explorer) #1

Hi,
I have CSV file as below

col1,col2,col3,col4
A,B,C,D
,,E,F
,G,H
,,,I
J,K,,,

More then 10 milion records,
I am using below config code as :-

input {
file {
path => "/home/nandan/data/data.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
columns => ["col1","col2","col3","col4"]
}
mutate {convert => ["col1","integer"] }
mutate {convert => ["col3","integer"] }
}
output {
elasticsearch {
hosts => "localhost"
index => "hotel"
document_type => "rooms"
}
stdout{}
}

but only half data are able to index into elasticsearch,
Error is :-

[ERROR] 2018-05-15 14:43:56.918 [LogStash::Runner] Logstash - org.jruby.exceptions.ThreadKill
[WARN ] 2018-05-15 14:43:56.939 [Ruby-0-Thread-9@[main]>worker0: /usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:385] csv - Error parsing csv {:field=>"message", :source=>"",34103,,UNITED STATES - USA,26.1858,-81.799\r", :exception=>#<CSV::MalformedCSVError: Unclosed quoted field on line 1.>}
[WARN ] 2018-05-15 14:43:56.948 [Ruby-0-Thread-9@[main]>worker0: /usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:385] csv - Error parsing csv {:field=>"message", :source=>"81512,supplier2,Chinatown Hotel,YAOWARAJ SUMPUNTAWONG 526,"Bangkok", :exception=>#<CSV::MalformedCSVError: Unclosed quoted field on line 1.>}

and only half data indexed.. Please tell me why is it happening.

Thanks


#2

You have a line that starts with a double quote, and does not have a closing double quote. That is an unclosed quoted field, so the csv filter fails to parse it. Same for the second line, there is no closing quote at the end of Bangkok.

However, that should not prevent the data being indexed, it should just be missing the fields you want.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.