Problem with csv filter

Hi I have a csv file, in order to index it in elasticsearch I am using logstash. I am using CSV filter and my conf file is:
'''input {
file {
sincedb_path => "/null"
path => "/home/kagamee/Downloads/time_logs.csv"
start_position => "beginning"
type => "logs"
}
}
filter {
csv{
separator => ","
columns => [
"nil",
"date",
"time",
"logid",
"type",
"subtype",
"level",
"vd",
"eventtime",
"srcip",
"srcport",
"srcintf",
"srcintfrole",
"dstip",
"dstport",
"dstintf",
"dstintfrole",
"action",
"policyid",
"policytype",
"service",
"dstcountry",
"srccountry",
"tradisp",
"sentpkt"
]
skip_empty_columns => true
quote_char => "'"
}
}
output {
stdout { codec => rubydebug }
#elasticsearch {

hosts => ["localhost:9200"]

index => "sample"

}

}'''
but after parsing the data I am getting output as:
'''{
"host" => "kagamee-Lenovo-Legion-Y7000P-1060",
"srccountry" => """" srccountry=""Russian Feder"",
"policytype" => ""5 policytype=""policy"",
"vd" => ""vd=""root"""",
"tradisp" => ""ation"" trandisp=""noop"" duration=0 sentbyte=0 rcvd"",
"sentpkt" => ""byte=0 sentpkt=0 appcat=""unscanned"" crscore=30 craction=131072 crlevel=""high"""",
"date" => "date=2020-05-18",
"logid" => ""logid=""0000000013"""",
"eventtime" => "eventtime=1589824586",
"@timestamp" => 2020-05-22T10:07:24.656Z,
"action" => ""6 action=""deny"",
"path" => "/home/kagamee/Downloads/time_logs.csv",
"type" => ""type=""traffic"""",
"srcintf" => ""srcintf=""port1"",
"dstcountry" => """" dstcountry=""India"",
"level" => ""level=""notice"""",
"subtype" => ""subtype=""forward"""",
"srcintfrole" => """" srcintfrole=""undefined"",
"@version" => "1",
"message" => ",date=2020-05-18,time=23:26:26,"logid=""0000000013""","type=""traffic""","subtype=""forward""","level=""notice""","vd=""root""",eventtime=1589824586,srcip=178.154.200.94,srcport=41798,"srcintf=""port1",""" srcintfrole=""undefined",""" dstip=45.249.108.19",7 dstport=44,"3 dstintf=""port2",""" dstintfrole=""undefined"" poluuid=""9bd3b56e-98c4-51ea-b15c-b04fcdf1a572"" sessionid=1629401503 proto=","6 action=""deny",""" policyid=3","5 policytype=""policy",""" service=""HTTPS",""" dstcountry=""India",""" srccountry=""Russian Feder","ation"" trandisp=""noop"" duration=0 sentbyte=0 rcvd","byte=0 sentpkt=0 appcat=""unscanned"" crscore=30 craction=131072 crlevel=""high"""",
"dstintfrole" => """" dstintfrole=""undefined"" poluuid=""9bd3b56e-98c4-51ea-b15c-b04fcdf1a572"" sessionid=1629401503 proto="",
"service" => """" service=""HTTPS"",
"srcip" => "srcip=178.154.200.94",
"srcport" => "srcport=41798",
"policyid" => """" policyid=3"",
"dstintf" => ""3 dstintf=""port2"",
"dstport" => "7 dstport=44",
"dstip" => """" dstip=45.249.108.19"",
"time" => "time=23:26:26"
}'''
I am unable to understand even though I have defined columns and they are mapped correctly still why it is giving a weird output?

you might want to share a sample log that produces the output. also it helps a lot if you wrap the codes in </> for easier reading

<date=2020-05-18,time=23:59:59,"logid=""0000000013""","type=""traffic""","subtype=""forward""","level=""notice""","vd=""root""",eventtime=1589826599,srcip=66.249.84.92,rcport=43417 s,"rcintf=""port1""","srcintfrole=""undefined""",dstip=45.249.108.197,dstport=443,"dstintf=""port2""","dstintfrole=""undefined"" poluuid=""9bd3b56e-98c4-51ea-b15c-b04fcdf1a572"" sessionid=1629563798 proto=6","action=""deny""",policyid=35,"policytype=""policy""","service=""HTTPS""","dstcountry=""India""","srccountry=""United States""","trandisp=""noop"" duration=0 sentbyte=0 rcvdbyte=0","sentpkt=0 appcat=""unscanned"" crscore=30 craction=131072 crlevel=""high""">

while it's csv, might be better to use kv filter as your log is in key value pair

I tried kv filter but out of 33k documents its parsing only 12k documents and discarding the others

you could tag those failed with kv on failure and process with another filter [quote="Vikash_Singh1, post:3, topic:233880, full:true"]
<date=2020-05-18,time=23:59:59,"logid=""0000000013""","type=""traffic""","subtype=""forward""","level=""notice""","vd=""root""",eventtime=1589826599,srcip=66.249.84.92,rcport=43417 s,"rcintf=""port1""","srcintfrole=""undefined""",dstip=45.249.108.197,dstport=443,"dstintf=""port2""","dstintfrole=""undefined"" poluuid=""9bd3b56e-98c4-51ea-b15c-b04fcdf1a572"" sessionid=1629563798 proto=6","action=""deny""",policyid=35,"policytype=""policy""","service=""HTTPS""","dstcountry=""India""","srccountry=""United States""","trandisp=""noop"" duration=0 sentbyte=0 rcvdbyte=0","sentpkt=0 appcat=""unscanned"" crscore=30 craction=131072 crlevel=""high""">
[/quote]

if you’re using csv here, the fields will be have field_name by column name and field_value by csv value so you will end up with something like this

“logid”: “ logid=""0000000013""

Even If i use "if" condition then which filter I should use? Can you help me?

Thanks in Advance

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.