Can't parse a CSV file - message is coming through though


(Dor Juravski) #1
  • update - problem solved, explanations in the replies section
    Hi, I am using the conf file below to parse a csv. I cannot get the message to populate into the fields that are mapped. Instead I see the message in kibana, but the fields are all zeros.
    stdout shows the fields are all parsed. After 3 days of trying this out I could use some help.

Kibana screenshot result is here showing the message is going through ok:

You should also know I removed a mutate clause to tell logstash what is an integer/float, but even then the parsing does not map into fields.

input {
file {
path => [ "/home/elastic/turbofiles/dxc/.csv" ]
# for future use:
# recursive search under path
# path => [ "/home/elastic/turbofiles/dxc/**/
.csv" ]

start_position => beginning
sincedb_path => "/dev/null"

}
}

filter {
csv {
#source => "message"
separator => ";"
skip_empty_columns => true
quote_char => "'"
columns => ["run_date", "location_continent", "location_state_country", "location_city", "clusterUuid", "VMs", "Hosts", "VM_Headroom", "VMHost_Density", "AvgCPUUsed", "UtilCPU", "PeakCPUUsed", "AvgCPUCapEff", "AvgCPUCapHA", "AvgMemUsed", "UtilMem", "PeakMemUsed", "AvgMemCapEff", "MaxMemCapHA", "DS_AvgSA", "DS_AvgSAFree", "DS_UtilSA", "DS_AvgSACap", "DS_AvgSP", "DS_AvgSPFree", "DS_UtilSP", "DS_AvgSPCap", "VM_AvgVCPU", "VM_UtilVCPU", "VM_PeakVCPU", "VM_AvgVMem", "VM_UtilVMem", "VM_PeakVMem", "VM_AvgIO", "VM_UtilIO", "VM_PeakIO", "VM_AvgSL", "VM_UtilSL", "VM_PeakSL", "VM_AvgSA", "VM_UtilSA", "VM_PeakSA", "VM_AvgSP", "VM_UtilSP", "VM_PeakSP", "Alloc_VM_VMem", "Alloc_Mem_Ratio", "Alloc_Mem_%", "Alloc_VM_VCPU", "Alloc_CPU_Ratio", "Alloc_CPU_%"]

}
if [VMs] == "VMs" {
drop { }
}
}

filter {
grok {
match => ["path", "_%{YEAR:year}%{MONTHNUM:month}%{MONTHDAY:day}.csv$"]
add_field => ["datetime", "%{year}.%{month}.%{day}"]
add_field => [ "received_at", "%{@timestamp}" ]
add_field => [ "received_from", "%{host}" ]
}
mutate {
remove_field => ["month", "day", "year"]
}
}

output {
elasticsearch {
hosts => "localhost"
index => "test"
document_type => "test_report"
}
stdout { }
}

Example for a csv file "as is" with the header line, which is parsed even when I tried to filter it out.

"run_date";"location_continent";"location_state_country";"location_city";"clusterUuid";"VMs";"Hosts";"VM_Headroom";"VMHost_Density";"AvgCPUUsed";"UtilCPU";"PeakCPUUsed";"AvgCPUCapEff";"AvgCPUCapHA";"AvgMemUsed";"UtilMem";"PeakMemUsed";"AvgMemCapEff";"MaxMemCapHA";"DS_AvgSA";"DS_AvgSAFree";"DS_UtilSA";"DS_AvgSACap";"DS_AvgSP";"DS_AvgSPFree";"DS_UtilSP";"DS_AvgSPCap";"VM_AvgVCPU";"VM_UtilVCPU";"VM_PeakVCPU";"VM_AvgVMem";"VM_UtilVMem";"VM_PeakVMem";"VM_AvgIO";"VM_UtilIO";"VM_PeakIO";"VM_AvgSL";"VM_UtilSL";"VM_PeakSL";"VM_AvgSA";"VM_UtilSA";"VM_PeakSA";"VM_AvgSP";"VM_UtilSP";"VM_PeakSP";"Alloc_VM_VMem";"Alloc_Mem_Ratio";"Alloc_Mem_%";"Alloc_VM_VCPU";"Alloc_CPU_Ratio";"Alloc_CPU_%"
"2017-08-15";"NA";"NY";"NEWYORK";"16d994b166b78a80e7f461f52470";"11";"4";"20";"2.75";"4.15";"15.96";"4.71";"26.03";"0";"8.90";"44.51";"3.93";"20.00";"0";"3436.70";"2652.71";"56.44";"6089.42";"5102.45";"7076.38";"41.90";"12178.83";"2.45";"7.83";"6.31";"0.19";"2.76";"0.96";"N;"N;"N;"60.44";"60.44";"3532.00";"12.13";"0.04";"12.46";"71.14";"0.12";"71.48";"7.00";"0.35";"35.01";"31.24";"1.20";"120.00"
"2017-08-15";"NA";"NY";"NEWYORK";"afb62dd5426e7a3891747a11ed4b";"29";"4";"63";"7.25";"7.46";"14.34";"4.09";"52.06";"0";"24.41";"30.51";"8.29";"80.00";"0";"3436.02";"2653.40";"56.43";"6089.42";"5176.45";"7002.39";"42.50";"12178.83";"4.89";"4.72";"7.89";"0.81";"1.33";"2.33";"N;"N;"N;"49.58";"49.58";"2919.00";"366.19";"0.70";"366.19";"1366.45";"1.31";"1366.45";"60.60";"0.76";"75.76";"103.52";"1.99";"198.84"


(Dor Juravski) #2

OK update - the mutate/convert is killing my values.
Not sure why, I tried to play with integer vs. float
A number like 17.23 converts into zero no matter what I choose (float/integer)

Also - I wish there was a "live debug" option for logstash. To step into each field and see what logstash wants to do and perhaps play with the real-time settings to see what logstash will output. Just an idea.


(Dor Juravski) #3

The problem is now solved, the main issue I had with "mutate" not working is that I should have left the seperator char as the default. This way all the data surrounded by double quotes was treated as expected with the help of "mutate... convert"

Also - my if/drop now works, and the date filter works for my "run_date" field.

Cheers,
JD


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.