I have recently added Filebeat to our ELK stack to read logs and send the data to Logstash. Every so often, Logstash prints the following an "Error parsing csv" error. The result is that we get corrupt data in the database, because the csv fields are misaligned.
Before adding Filebeat, we were reading the logs using Logstash. It was slower and used more CPU, but we did not have these errors.
In the errors below, you can see the "Illegal quoting in line 1" bit. This is because the error always occurs in the middle of a string which is quoted, but the first quote is missing. And that is because for some reason Logstash appears to be receiving the message beginning somewhere in the string.
Thinking this is related to inode reuse, I have tried numerous Filebeat options like clean_* and close_* as shown below. Nothing has worked so far.
Here are some sample errors from Logstash
[2018-04-11T08:15:17,885][WARN ][logstash.filters.csv ] Error parsing csv {:field=>"message", :source=>"lity\",10.4.2.65,64490,72.21.91.29,80,04/11/2018 08:14:41.715232,04/11/2018 08:14:41.767610,52,1418,\"HTTP\",606,980,980,606,4,3,3,4,1,1,1,3,3,3,0,0,0,0,0,0,0,0,0,0,0,5.000000,5.000000,5.000000,,,,,,,5.000000,5.000000,5.000000,0,13", :exception=>#<CSV::MalformedCSVError: Illegal quoting in line 1.>}
[2018-04-11T08:18:14,697][WARN ][logstash.filters.csv ] Error parsing csv {:field=>"message", :source=>" Quality\",10.4.2.65,64570,64.4.54.254,443,04/11/2018 08:16:55.704939,04/11/2018 08:18:04.403762,68698,1454,\"HTTPS\",5806,5294,5294,5806,13,12,12,13,2011,4078,3561,109,268,228,0,0,0,0,0,0,0,0,0,0,0,5.000000,5.000000,5.000000,,,,,,,5.000000,5.000000,5.000000,0,13", :exception=>#<CSV::MalformedCSVError: Illegal quoting in line 1.>}
Filebeat is configured to watch around 10 different files. However, it is only this one harvester that we are having a problem with. This uniquely timestamped named file is generated every 5 seconds.
Here is a sample from this file:
04/11/2018 08:23:57,1,"Savvius worst of the worst",topn,all,1,"All Savvius",0,"","App Latency",10.4.2.65,64806,72.21.81.200,443,04/11/2018 08:23:10.917145,04/11/2018
08:23:10.946133,28,1454,"HTTPS",2630,22769,22769,2630,28,24,24,28,0,0,0,4,4,4,0,0,0,0,0,0,0,0,0,0,0,5.000000,5.000000,5.000000,,,,,,,5.000000,5.000000,5.000000,0,13
04/11/2018 08:23:57,1,"Savvius worst of the worst",topn,all,1,"All Savvius",0,"","App Latency",10.4.2.89,49201,10.8.1.56,9200,04/11/2018 08:23:46.618455,04/11/2018 0
8:23:53.625176,7006,1400,"TCP",78,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5.000000,5.000000,5.000000,,,,,,,5.000000,5.000000,5.000000,14,13
04/11/2018 08:23:57,1,"Savvius worst of the worst",topn,all,1,"All Savvius",0,"","Net Latency",10.4.2.65,64806,72.21.81.200,443,04/11/2018 08:23:10.917145,04/11/2018
08:23:10.946133,28,1454,"HTTPS",2630,22769,22769,2630,28,24,24,28,0,0,0,4,4,4,0,0,0,0,0,0,0,0,0,0,0,5.000000,5.000000,5.000000,,,,,,,5.000000,5.000000,5.000000,0,13
The Filebeat config for this file looks like this:
- type: log
paths:
- "/var/lib/omni/data/streaming_analytics_conversations_*.csv"
tags: [ "sv_all" ]
ignore_older : 1m
clean_removed : true
scan_frequency : 5s
close_eof : true
I really want to make Filebeat work, because it uses much less CPU to detect and read the files.