<CSV::MalformedCSVError: Illegal quoting in line 1.>

Hello,

I am trying to load csv files using the logstash pipeline. The pipeline works but the indexed results is not what I expected.

When I use the same csv file and import it using the data visualizer in machine learning, I am able to get the result I expect.

Visualize data from a log file EXPERIMENTAL

The File Data Visualizer helps you understand the fields and metrics in a log file. Upload your file, analyze its data, and then choose whether to import the data into an Elasticsearch index.

The File Data Visualizer supports these file formats:
Delimited text files, such as CSV and TSV
Newline-delimited JSON
Log files with a common format for the timestamp

When using this feature the file is uploaded and the data is already displayed nicely, and I only have to remove some mappings in the advanced tab as this data was indexed in elasticsearch before. So I have to remove the fields @version, _id, _index and_type.

Importing data using the pipeline:

        input {

    file {
        path => "/usr/share/logstash/bin/datasets/apachelogs/*"
       start_position => "beginning"
       }
    }

    filter {
      csv {
            separator => ","
            columns => ["@timestamp","@version","_id","_index","_score","_type","agent","auth","bytes","clientip","geoip.city_name","geoip.continent_code","geoip.country_code2","geoip.country_code3","geoip.country_name","geoip.dma_code","geoip.ip","geoip.latitude","geoip.location","geoip.longitude","geoip.postal_code","geoip.region_code","geoip.region_name","geoip.timezone","host","httpversion","ident","message","referrer","request","response","timestamp","useragent.build","useragent.device","useragent.major","useragent.minor","useragent.name","useragent.os","useragent.os_major","useragent.os_minor","useragent.os_name","useragent.patch","verb"
    ]
     }
    }

    filter {
     mutate {
      remove_field => ["@version","_id","_index","_score","_type"]
    }
    }

    output {
      elasticsearch {
        hosts => "192.168.1.102:9200"
        manage_template => false
        index => "apachelog-%{+YYYY.MM.dd}"
        user => "elastic"
        password => "password"
    #    document_type => "%{[@metadata][type]}"

    }

    }

For, as it looks like, all of the lines of data I see an WARN message:
example:

[WARN ] 2019-01-03 14:41:33.752 [Ruby-0-Thread-6: :1] csv - Error parsing csv {:field=>"message", :source=>"}","37.618",101194,MOW,Moscow,"Europe/Moscow","ip-172-31-31-208","1.1","-","%{start}[01/Aug/2018:16:13:32 +0000]%{end}","""http://semicomplete.com/presentations/logstash-monitorama-2013/""","/presentations/logstash-monitorama-2013/images/sad-medic.png",200,"26/Aug/2014:21:13:42 +0000",,Other,32,0,Chrome,"Mac OS X",10,9,"Mac OS X",1700,GET", :exception=>#<CSV::MalformedCSVError: Illegal quoting in line 1.>}

Here is an sample of the original data:
First line is the header row:

"@timestamp","@version","_id","_index","_score","_type",agent,auth,bytes,clientip,"geoip.city_name","geoip.continent_code","geoip.country_code2","geoip.country_code3","geoip.country_name","geoip.dma_code","geoip.ip","geoip.latitude","geoip.location","geoip.longitude","geoip.postal_code","geoip.region_code","geoip.region_name","geoip.timezone",host,httpversion,ident,message,referrer,request,response,timestamp,"useragent.build","useragent.device","useragent.major","useragent.minor","useragent.name","useragent.os","useragent.os_major","useragent.os_minor","useragent.os_name","useragent.patch",verb
"August 1st 2018, 21:59:11.000",1,eOsuemUB5qZYzDhQ8xbC,"apachelogs-2018.08.01",,doc,"""Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)""","-",,"68.180.224.225",Sunnyvale,NA,US,US,"United States",807,"68.180.224.225","37.425","{
""lon"": -122.0074,
""lat"": 37.4249
}","-122.007",94089,CA,California,"America/Los_Angeles","ip-172-31-31-208","1.1","-","%{start}[01/Aug/2018:21:59:11 +0000]%{end}","""-""","/robots.txt",200,"27/Aug/2014:02:59:21 +0000",,Spider,,,"Yahoo! Slurp",Other,,,Other,,GET

I have read a lot of topics in the forum already, but I haven't been able to find what is wrong.

I hope the question and examples are clear enough.

Thanks,
Patrick

You have newlines embedded in the JSON object for geo_ip.location. So that "line" is actually four different events in logstash. Certainly the first and fourth will generate errors since the quoting is imbalanced. The example WARN you gave is like the fourth line.

Thanks for your reply. This makes sense.

But how come when using the upload function in kibana the same data is inserted fine?
The only thing I can think of is that the logstash pipeline used is based on csv and the file is actually not an csv but a log file of some sort.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.