How to split CSV file and filter CSV Files

Hans · September 4, 2015, 11:03pm

Hi,
I have a challenge where I am not able to filter a CSV file in logstash and pass this a separate files to elasticsearch.

Original files:
CLIENT_IP,ISP,TEST_DATE,SERVER_NAME,DOWNLOAD_KBPS,UPLOAD_KBPS,LATENCY,LATITUDE,LONGITUDE,CONNECTION_TYPE
41.151..,Telkom Internet,8/14/2012 01:40:43 GMT,Windhoek,3894,2401,194,-26.0947,28.2161,Cell
41.243..,Airtel DRC,8/14/2012 04:51:21 GMT,Windhoek,871,1170,146,-26.0749,28.2503,WiFi
41.243..,Airtel DRC,8/14/2012 04:52:09 GMT,Windhoek,878,1086,156,-26.0749,28.2503,WiFi

Configuration file:
input {
file {
type => "speedtest"
path => [ "/data/speedtest3.txt" ]
start_position => "beginning"
}
}
filter {
grok {
patterns_dir => "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-patterns-core-0.3.0/patterns"
match => ["messages", "%{URIHOST}.,%{WORD:ISP},%{QS},%{WORD:SERVER_NAME},%{NUMBER:DOWNLOAD_KBPS},%{NUMBER:UPLOAD_KBPS},%{NUMBER:LATENCY},%{JAVACLASS},%{JAVACLASS},%{WORD:CONNECTION_TYPE}"]
tag_on_failure => []
add_tag => "Android-Speedtest"
}
mutate {
split => ["messages", ","]
}
kv{
field_split => ","
value_split => ","
source => "kvdata"
remove_field => "kvdata"
}
}
output {
elasticsearch {
protocol => "node"
host => "localhost"
cluster => "elasticsearch"
}
}

Output information:
{
"_index": ".kibana",
"_type": "index-pattern",
"_id": "logstash-",
"_score": 1,
"_source": {
"title": "logstash-",
"timeFieldName": "@timestamp",
"customFormats": "{}",
"fields": "[{"type":"string","indexed":false,"analyzed":false,"name":"_index","count":0,"scripted":false},{"type":"string","indexed":true,"analyzed":false,"name":"_type","count":0,"scripted":false},{"type":"geo_point","indexed":true,"analyzed":false,"doc_values":false,"name":"geoip.location","count":0,"scripted":false},{"type":"string","indexed":true,"analyzed":false,"doc_values":false,"name":"@version","count":0,"scripted":false},{"type":"string","indexed":false,"analyzed":false,"name":"_source","count":0,"scripted":false},{"type":"string","indexed":false,"analyzed":false,"name":"_id","count":1,"scripted":false},{"type":"string","indexed":true,"analyzed":false,"doc_values":false,"name":"host.raw","count":0,"scripted":false},{"type":"string","indexed":true,"analyzed":false,"doc_values":false,"name":"type.raw","count":0,"scripted":false},{"type":"string","indexed":true,"analyzed":true,"doc_values":false,"name":"message","count":0,"scripted":false},{"type":"string","indexed":true,"analyzed":true,"doc_values":false,"name":"type","count":0,"scripted":false},{"type":"string","indexed":true,"analyzed":true,"doc_values":false,"name":"path","count":0,"scripted":false},{"type":"date","indexed":true,"analyzed":false,"doc_values":false,"name":"@timestamp","count":2,"scripted":false},{"type":"string","indexed":true,"analyzed":true,"doc_values":false,"name":"host","count":0,"scripted":false},{"type":"string","indexed":true,"analyzed":false,"doc_values":false,"name":"path.raw","count":0,"scripted":false}]"
},
"fields": {}
}
The output does not separate the different fields. When trying to use CSV the @timestamp could not be collected. I would appreciate any kind of assistance to get this challenge resolved as there is Latitude and longitude information also that I would like to use in Kibana 4.

Christian_Dahlqvist · September 5, 2015, 10:35am

For CSV style input it might be worthwhile looking into using the CSV filter instead of grok. This filter extracts all fields as strings, so you may need to also use the mutate filter to convert field types where necessary.

Topic		Replies	Views
Creating Logstash Configuration Logstash	12	1769	July 6, 2017
Splitting csv files send to elasticsearch via logstash http input Logstash	1	319	August 5, 2021
Using Csv filter on a Split value Logstash	2	908	May 21, 2017
Csv parsing through logstash Logstash	4	2314	May 30, 2018
Multiple input for Logstash from filebeat Logstash	11	2688	July 6, 2017

How to split CSV file and filter CSV Files

Related topics