OK, I've been working this all morning and I've tried everything I can think of and googled.
We are doing some processing of netflow data and it produces a csv file. We want to ingest each csv file into Elastic, but for some reason, the fields won't parse out. I tried with logstash running on Windows 2016 and with it running on CentOS 6.8. Both times all the lines went into ElasticSearch, but they were not parsed by field names as designated by the column headers.
Here is my conf file:
input {
file {
path => "C:/data/*.csv"
type => testdata
tags => TF
start_position => beginning
}
}
filter {
csv {
source => flow
separator => ","
columns => [
"first_seen",
"dep",
"src",
"dst",
"dport",
"cat",
"proto",
"day",
"timeper",
"times_seen",
"last_seen",
"alerts",
"first_alert",
"last_alert",
"spayload",
"dpayload",
"sbytes",
"dbytes",
"spackets",
"dpackets",
"duration",
"max_duration",
"min_duration",
"scountry",
"sorg",
"slat",
"slong",
"dcounty",
"dorg",
"dlat",
"dlong"
"distance"
]
add_tag => ["checkparse"]
}
if ([col1]=="first_seen") { drop {} }
mutate {
add_tag => ["secondcheck"]
}
}
output {
elasticsearch {
hosts => ["1.1.1.1:9200"]
index => "test"
user => elastic
password => itsasecret
}
}
And here is a representation of our data (mostly made up so the protocols and bytes and packets and lat and long won't make sense):
2/2/2017 00:00:12 AM,NYC,10.10.10.10,12.12.12.12,22,SSH,12,2/2/2017,20170202,5,2/2/2017 22:10,4,,,4324,5234,6543,2345,25,25,5,6,50,53,US,NYSE,40.013,-80.043,US,NASQ,39.123,-90.011,564
Each file has a header row with the above fields and that header row is followed by several thousand rows of data formatted like the above. When this data gets ingested in to Elastic, it shows the the first tag of 'TF', the third tag of 'secondcheck', but none of the fields nor the second tag of 'checkparse'.
Why don't the field parse out? What am I doing wrong? How can I get logstash to give me more details on why it won't parse?
What makes this frustrating is that we are sending some other logs in csv format to the logstash running on the CentOS via syslog and those are being parsed just fine.
Thanks.
I've looked at:
http s://stackoverflow.com/questions/31095020/use-logstash-csv-filter-doesnt-work
http s://stackoverflow.com/questions/37583805/cannot-parse-csv-file-with-logstash
http s://discuss.elastic.co/t/cant-parse-a-csv-file/26362