Hello,
I am trying to load csv files using the logstash pipeline. The pipeline works but the indexed results is not what I expected.
When I use the same csv file and import it using the data visualizer in machine learning, I am able to get the result I expect.
Visualize data from a log file EXPERIMENTAL
The File Data Visualizer helps you understand the fields and metrics in a log file. Upload your file, analyze its data, and then choose whether to import the data into an Elasticsearch index.
The File Data Visualizer supports these file formats:
Delimited text files, such as CSV and TSV
Newline-delimited JSON
Log files with a common format for the timestamp
When using this feature the file is uploaded and the data is already displayed nicely, and I only have to remove some mappings in the advanced tab as this data was indexed in elasticsearch before. So I have to remove the fields @version, _id, _index and_type.
Importing data using the pipeline:
input {
file {
path => "/usr/share/logstash/bin/datasets/apachelogs/*"
start_position => "beginning"
}
}
filter {
csv {
separator => ","
columns => ["@timestamp","@version","_id","_index","_score","_type","agent","auth","bytes","clientip","geoip.city_name","geoip.continent_code","geoip.country_code2","geoip.country_code3","geoip.country_name","geoip.dma_code","geoip.ip","geoip.latitude","geoip.location","geoip.longitude","geoip.postal_code","geoip.region_code","geoip.region_name","geoip.timezone","host","httpversion","ident","message","referrer","request","response","timestamp","useragent.build","useragent.device","useragent.major","useragent.minor","useragent.name","useragent.os","useragent.os_major","useragent.os_minor","useragent.os_name","useragent.patch","verb"
]
}
}
filter {
mutate {
remove_field => ["@version","_id","_index","_score","_type"]
}
}
output {
elasticsearch {
hosts => "192.168.1.102:9200"
manage_template => false
index => "apachelog-%{+YYYY.MM.dd}"
user => "elastic"
password => "password"
# document_type => "%{[@metadata][type]}"
}
}
For, as it looks like, all of the lines of data I see an WARN message:
example:
[WARN ] 2019-01-03 14:41:33.752 [Ruby-0-Thread-6: :1] csv - Error parsing csv {:field=>"message", :source=>"}","37.618",101194,MOW,Moscow,"Europe/Moscow","ip-172-31-31-208","1.1","-","%{start}[01/Aug/2018:16:13:32 +0000]%{end}","""http://semicomplete.com/presentations/logstash-monitorama-2013/\"\"\",\"/presentations/logstash-monitorama-2013/images/sad-medic.png\",200,\"26/Aug/2014:21:13:42 +0000",,Other,32,0,Chrome,"Mac OS X",10,9,"Mac OS X",1700,GET", :exception=>#<CSV::MalformedCSVError: Illegal quoting in line 1.>}
Here is an sample of the original data:
First line is the header row:
"@timestamp","@version","_id","_index","_score","_type",agent,auth,bytes,clientip,"geoip.city_name","geoip.continent_code","geoip.country_code2","geoip.country_code3","geoip.country_name","geoip.dma_code","geoip.ip","geoip.latitude","geoip.location","geoip.longitude","geoip.postal_code","geoip.region_code","geoip.region_name","geoip.timezone",host,httpversion,ident,message,referrer,request,response,timestamp,"useragent.build","useragent.device","useragent.major","useragent.minor","useragent.name","useragent.os","useragent.os_major","useragent.os_minor","useragent.os_name","useragent.patch",verb
"August 1st 2018, 21:59:11.000",1,eOsuemUB5qZYzDhQ8xbC,"apachelogs-2018.08.01",,doc,"""Mozilla/5.0 (compatible; Yahoo! Slurp; Why is Slurp crawling my page? | Search for Desktop Help - SLN22600)""","-",,"68.180.224.225",Sunnyvale,NA,US,US,"United States",807,"68.180.224.225","37.425","{
""lon"": -122.0074,
""lat"": 37.4249
}","-122.007",94089,CA,California,"America/Los_Angeles","ip-172-31-31-208","1.1","-","%{start}[01/Aug/2018:21:59:11 +0000]%{end}","""-""","/robots.txt",200,"27/Aug/2014:02:59:21 +0000",,Spider,,,"Yahoo! Slurp",Other,,,Other,,GET
I have read a lot of topics in the forum already, but I haven't been able to find what is wrong.
I hope the question and examples are clear enough.
Thanks,
Patrick