How to split CSV file and filter CSV Files


(Hans) #1

Hi,
I have a challenge where I am not able to filter a CSV file in logstash and pass this a separate files to elasticsearch.

Original files:
CLIENT_IP,ISP,TEST_DATE,SERVER_NAME,DOWNLOAD_KBPS,UPLOAD_KBPS,LATENCY,LATITUDE,LONGITUDE,CONNECTION_TYPE
41.151..,Telkom Internet,8/14/2012 01:40:43 GMT,Windhoek,3894,2401,194,-26.0947,28.2161,Cell
41.243..,Airtel DRC,8/14/2012 04:51:21 GMT,Windhoek,871,1170,146,-26.0749,28.2503,WiFi
41.243..,Airtel DRC,8/14/2012 04:52:09 GMT,Windhoek,878,1086,156,-26.0749,28.2503,WiFi

Configuration file:
input {
file {
type => "speedtest"
path => [ "/data/speedtest3.txt" ]
start_position => "beginning"
}
}
filter {
grok {
patterns_dir => "/opt/logstash/vendor/bundle/jruby/1.9/gems/logstash-patterns-core-0.3.0/patterns"
match => ["messages", "%{URIHOST}.,%{WORD:ISP},%{QS},%{WORD:SERVER_NAME},%{NUMBER:DOWNLOAD_KBPS},%{NUMBER:UPLOAD_KBPS},%{NUMBER:LATENCY},%{JAVACLASS},%{JAVACLASS},%{WORD:CONNECTION_TYPE}"]
tag_on_failure => []
add_tag => "Android-Speedtest"
}
mutate {
split => ["messages", ","]
}
kv{
field_split => ","
value_split => ","
source => "kvdata"
remove_field => "kvdata"
}
}
output {
elasticsearch {
protocol => "node"
host => "localhost"
cluster => "elasticsearch"
}
}

Output information:
{
"_index": ".kibana",
"_type": "index-pattern",
"_id": "logstash-",
"_score": 1,
"_source": {
"title": "logstash-
",
"timeFieldName": "@timestamp",
"customFormats": "{}",
"fields": "[{"type":"string","indexed":false,"analyzed":false,"name":"_index","count":0,"scripted":false},{"type":"string","indexed":true,"analyzed":false,"name":"_type","count":0,"scripted":false},{"type":"geo_point","indexed":true,"analyzed":false,"doc_values":false,"name":"geoip.location","count":0,"scripted":false},{"type":"string","indexed":true,"analyzed":false,"doc_values":false,"name":"@version","count":0,"scripted":false},{"type":"string","indexed":false,"analyzed":false,"name":"_source","count":0,"scripted":false},{"type":"string","indexed":false,"analyzed":false,"name":"_id","count":1,"scripted":false},{"type":"string","indexed":true,"analyzed":false,"doc_values":false,"name":"host.raw","count":0,"scripted":false},{"type":"string","indexed":true,"analyzed":false,"doc_values":false,"name":"type.raw","count":0,"scripted":false},{"type":"string","indexed":true,"analyzed":true,"doc_values":false,"name":"message","count":0,"scripted":false},{"type":"string","indexed":true,"analyzed":true,"doc_values":false,"name":"type","count":0,"scripted":false},{"type":"string","indexed":true,"analyzed":true,"doc_values":false,"name":"path","count":0,"scripted":false},{"type":"date","indexed":true,"analyzed":false,"doc_values":false,"name":"@timestamp","count":2,"scripted":false},{"type":"string","indexed":true,"analyzed":true,"doc_values":false,"name":"host","count":0,"scripted":false},{"type":"string","indexed":true,"analyzed":false,"doc_values":false,"name":"path.raw","count":0,"scripted":false}]"
},
"fields": {}
}
The output does not separate the different fields. When trying to use CSV the @timestamp could not be collected. I would appreciate any kind of assistance to get this challenge resolved as there is Latitude and longitude information also that I would like to use in Kibana 4.


(Christian Dahlqvist) #2

For CSV style input it might be worthwhile looking into using the CSV filter instead of grok. This filter extracts all fields as strings, so you may need to also use the mutate filter to convert field types where necessary.


(system) #3