Logstash csv - date and file format


(Craig) #1

I'm having trouble getting my timestamp field from csv file into the @timestamp in ES. I have tried the timestamp field a few different ways, trying to stay in a standard if possible.

In the debug logs i see an error about file type but also _dateparsefailure, so I probably have more than one issue. I appreciate any assistance.

source file (header plus a data line):
Value,TimeStamp,Description,Entity,EntityId,IntervalSecs,MetricId,Unit
11007,2017-02-14 11:35:00Z,CPU usage in megahertz during the interval,IDC_DMZ01,ClusterComputeResource-domain-c275,300,cpu.usagemhz.average,MHz

logstash.conf:
input {
file {
path => "/home/user/vmware-stats/*.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
columns => ["Value","Timestamp","MetricId","Unit","Description","Entity","EntityId","IntervalSecs","Instance"]
}
date {
# 2017-02-07T14:30:00
#match => ["Timestamp", "yyyy-MM-dd'T'HH:mm:ss"]
match => ["Timestamp", "ISO8601"]
target => "@timestamp"
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "vmware-stats"
}
stdout {}
}

logstash-plain.log entries:
[2017-02-14T12:16:42,651][DEBUG][logstash.inputs.file ] Received line {:path=>"/home/user/vmware-stats/2017-02-14-IDC_DMZ01-stats-300-interval.csv", :text=>"\xFF\xFEV\x00a\x00l\x00u\x00e\x00,\x00T\x00i\x00m\x00e\x00S\x00t\x00a\x00m\x00p\x00,\x00D\x00e\x00s\x00c\x00r\x00i\x00p\x00t\x00i\x00o\x00n\x00,\x00E\x00n\x00t\x00i\x00t\x00y\x00,\x00E\x00n\x00t\x00i\x00t\x00y\x00I\x00d\x00,\x00I\x00n\x00t\x00e\x00r\x00v\x00a\x00l\x00S\x00e\x00c\x00s\x00,\x00M\x00e\x00t\x00r\x00i\x00c\x00I\x00d\x00,\x00U\x00n\x00i\x00t\x00\r\x00"}
[2017-02-14T12:16:42,672][WARN ][logstash.codecs.plain ] Received an event that has a different character encoding than you configured. {:text=>"\xFF\xFEV\u0000a\u0000l\u0000u\u0000e\u0000,\u0000T\u0000i\u0000m\u0000e\u0000S\u0000t\u0000a\u0000m\u0000p\u0000,\u0000D\u0000e\u0000s\u0000c\u0000r\u0000i\u0000p\u0000t\u0000i\u0000o\u0000n\u0000,\u0000E\u0000n\u0000t\u0000i\u0000t\u0000y\u0000,\u0000E\u0000n\u0000t\u0000i\u0000t\u0000y\u0000I\u0000d\u0000,\u0000I\u0000n\u0000t\u0000e\u0000r\u0000v\u0000a\u0000l\u0000S\u0000e\u0000c\u0000s\u0000,\u0000M\u0000e\u0000t\u0000r\u0000i\u0000c\u0000I\u0000d\u0000,\u0000U\u0000n\u0000i\u0000t\u0000\r\u0000", :expected_charset=>"UTF-8"}

and

[2017-02-14T12:16:52,031][DEBUG][logstash.pipeline ] output received {"event"=>{"Entity"=>"\u0000 \u0000h\u0000o\u0000s\u0000t\u0000", "column10"=>"\u0000K\u0000B\u0000", "Description"=>"\u0000A\u0000m\u0000o\u0000u\u0000n\u0000t\u0000 \u0000o\u0000f\u0000 \u0000h\u0000o\u0000s\u0000t\u0000 \u0000p\u0000h\u0000y\u0000s\u0000i\u0000c\u0000a\u0000l\u0000 \u0000m\u0000e\u0000m\u0000o\u0000r\u0000y\u0000 \u0000c\u0000o\u0000n\u0000s\u0000u\u0000m\u0000e\u0000d\u0000 \u0000b\u0000y\u0000 \u0000a\u0000 \u0000v\u0000i\u0000r\u0000t\u0000u\u0000a\u0000l\u0000 \u0000m\u0000a\u0000c\u0000h\u0000i\u0000n\u0000e\u0000", "message"=>"\u00009\u00009\u00004\u00009\u00008\u00009\u00001\u00000\u0000,\u00002\u00000\u00001\u00007\u0000-\u00000\u00002\u0000-\u00001\u00001\u0000 \u00001\u00001\u0000:\u00004\u00000\u0000:\u00000\u00000\u0000Z\u0000,\u0000A\u0000m\u0000o\u0000u\u0000n\u0000t\u0000 \u0000o\u0000f\u0000 \u0000h\u0000o\u0000s\u0000t\u0000 \u0000p\u0000h\u0000y\u0000s\u0000i\u0000c\u0000a\u0000l\u0000 \u0000m\u0000e\u0000m\u0000o\u0000r\u0000y\u0000 \u0000c\u0000o\u0000n\u0000s\u0000u\u0000m\u0000e\u0000d\u0000 \u0000b\u0000y\u0000 \u0000a\u0000 \u0000v\u0000i\u0000r\u0000t\u0000u\u0000a\u0000l\u0000 \u0000m\u0000a\u0000c\u0000h\u0000i\u0000n\u0000e\u0000,\u0000 \u0000h\u0000o\u0000s\u0000t\u0000,\u0000 \u0000o\u0000r\u0000 \u0000c\u0000l\u0000u\u0000s\u0000t\u0000e\u0000r\u0000,\u0000I\u0000D\u0000C\u0000_\u0000D\u0000M\u0000Z\u00000\u00001\u0000,\u0000C\u0000l\u0000u\u0000s\u0000t\u0000e\u0000r\u0000C\u0000o\u0000m\u0000p\u0000u\u0000t\u0000e\u0000R\u0000e\u0000s\u0000o\u0000u\u0000r\u0000c\u0000e\u0000-\u0000d\u0000o\u0000m\u0000a\u0000i\u0000n\u0000-\u0000c\u00002\u00007\u00005\u0000,\u00003\u00000\u00000\u0000,\u0000m\u0000e\u0000m\u0000.\u0000c\u0000o\u0000n\u0000s\u0000u\u0000m\u0000e\u0000d\u0000.\u0000a\u0000v\u0000e\u0000r\u0000a\u0000g\u0000e\u0000,\u0000K\u0000B\u0000\r\u0000", "Unit"=>"\u00003\u00000\u00000\u0000", "Timestamp"=>"\u00002\u00000\u00001\u00007\u0000-\u00000\u00002\u0000-\u00001\u00001\u0000 \u00001\u00001\u0000:\u00004\u00000\u0000:\u00000\u00000\u0000Z\u0000", "tags"=>["dateparsefailure"], "EntityId"=>"\u0000 \u0000o\u0000r\u0000 \u0000c\u0000l\u0000u\u0000s\u0000t\u0000e\u0000r\u0000", "path"=>"/home/user/vmware-stats/2017-02-14-IDC_DMZ01-stats-300-interval.csv", "MetricId"=>"\u0000C\u0000l\u0000u\u0000s\u0000t\u0000e\u0000r\u0000C\u0000o\u0000m\u0000p\u0000u\u0000t\u0000e\u0000R\u0000e\u0000s\u0000o\u0000u\u0000r\u0000c\u0000e\u0000-\u0000d\u0000o\u0000m\u0000a\u0000i\u0000n\u0000-\u0000c\u00002\u00007\u00005\u0000", "@timestamp"=>2017-02-14T17:16:51.681Z, "@version"=>"1", "host"=>"ubuntu", "Value"=>"\u00009\u00009\u00004\u00009\u00008\u00009\u00001\u00000\u0000", "column9"=>"\u0000m\u0000e\u0000m\u0000.\u0000c\u0000o\u0000n\u0000s\u0000u\u0000m\u0000e\u0000d\u0000.\u0000a\u0000v\u0000e\u0000r\u0000a\u0000g\u0000e\u0000", "IntervalSecs"=>"\u0000I\u0000D\u0000C\u0000\u0000D\u0000M\u0000Z\u00000\u00001\u0000"}}

Thanks,
Craig


(Craig) #2

I have tried to alter the time date format to "2017-02-14 15:35" in the csv file and use "match => ["TimeStamp", "yyyy-MM-dd HH:mm"]" in the logstash config file. Still no luck. Still receiving "tags"=>["_dateparsefailure"] in the debug logs.

Help?


(Magnus Bäck) #3

You main problem is that the file isn't UTF-8. Start by adjusting the character set option to match the actual character set of the file. Once that's done you can return to the date filter if needed.


(Craig) #4

Thanks Magnus. Yea, I figured this out late last night...

I'm using powershell (PowerCLI) to extract statistical data from VMware with the idea of using elasticsearch to analyze and build graphs, trends, etc. Turns out MS by default writes a weird file encoding called BOM = Byte-Order Mark. So even though the docs for Out-File and for Export-CSV say they default to UTF8 or ASCII, its not the same UTF8 as the rest of the world.

So to save the next guy/gal a few days of his/her life, here's how I resolved it. After getting the stats I was interested in, exported the data specifying the encoding style.

$stats | select value,@{n="TimeStamp";e={$.timestamp.ToString("yyyy-MM-dd HH:mm")}},description,entity,entityid,intervalsecs,metricid,unit | convertto-csv -NoTypeInformation -Delimiter "," | % {$ -replace '"',''} | Out-File -encoding ASCII $filepath

As soon as the encoding was cleaned up, data was imported easily.

Thank you,
Craig


(Guy Boertje) #5

Nice one Magnus.
It looks like UTF16 to me.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.