Logstash csv - date and file format

cthomps · February 14, 2017, 5:24pm

I'm having trouble getting my timestamp field from csv file into the @timestamp in ES. I have tried the timestamp field a few different ways, trying to stay in a standard if possible.

In the debug logs i see an error about file type but also _dateparsefailure, so I probably have more than one issue. I appreciate any assistance.

source file (header plus a data line):
Value,TimeStamp,Description,Entity,EntityId,IntervalSecs,MetricId,Unit
11007,2017-02-14 11:35:00Z,CPU usage in megahertz during the interval,IDC_DMZ01,ClusterComputeResource-domain-c275,300,cpu.usagemhz.average,MHz

logstash.conf:
input {
file {
path => "/home/user/vmware-stats/*.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
columns => ["Value","Timestamp","MetricId","Unit","Description","Entity","EntityId","IntervalSecs","Instance"]
}
date {
# 2017-02-07T14:30:00
#match => ["Timestamp", "yyyy-MM-dd'T'HH:mm:ss"]
match => ["Timestamp", "ISO8601"]
target => "@timestamp"
}
}
output {
elasticsearch {
hosts => "http://localhost:9200"
index => "vmware-stats"
}
stdout {}
}

logstash-plain.log entries:
[2017-02-14T12:16:42,651][DEBUG][logstash.inputs.file ] Received line {:path=>"/home/user/vmware-stats/2017-02-14-IDC_DMZ01-stats-300-interval.csv", :text=>"\xFF\xFEV\x00a\x00l\x00u\x00e\x00,\x00T\x00i\x00m\x00e\x00S\x00t\x00a\x00m\x00p\x00,\x00D\x00e\x00s\x00c\x00r\x00i\x00p\x00t\x00i\x00o\x00n\x00,\x00E\x00n\x00t\x00i\x00t\x00y\x00,\x00E\x00n\x00t\x00i\x00t\x00y\x00I\x00d\x00,\x00I\x00n\x00t\x00e\x00r\x00v\x00a\x00l\x00S\x00e\x00c\x00s\x00,\x00M\x00e\x00t\x00r\x00i\x00c\x00I\x00d\x00,\x00U\x00n\x00i\x00t\x00\r\x00"}
[2017-02-14T12:16:42,672][WARN ][logstash.codecs.plain ] Received an event that has a different character encoding than you configured. {:text=>"\xFF\xFEV\u0000a\u0000l\u0000u\u0000e\u0000,\u0000T\u0000i\u0000m\u0000e\u0000S\u0000t\u0000a\u0000m\u0000p\u0000,\u0000D\u0000e\u0000s\u0000c\u0000r\u0000i\u0000p\u0000t\u0000i\u0000o\u0000n\u0000,\u0000E\u0000n\u0000t\u0000i\u0000t\u0000y\u0000,\u0000E\u0000n\u0000t\u0000i\u0000t\u0000y\u0000I\u0000d\u0000,\u0000I\u0000n\u0000t\u0000e\u0000r\u0000v\u0000a\u0000l\u0000S\u0000e\u0000c\u0000s\u0000,\u0000M\u0000e\u0000t\u0000r\u0000i\u0000c\u0000I\u0000d\u0000,\u0000U\u0000n\u0000i\u0000t\u0000\r\u0000", :expected_charset=>"UTF-8"}

and

[2017-02-14T12:16:52,031][DEBUG][logstash.pipeline ] output received {"event"=>{"Entity"=>"\u0000 \u0000h\u0000o\u0000s\u0000t\u0000", "column10"=>"\u0000K\u0000B\u0000", "Description"=>"\u0000A\u0000m\u0000o\u0000u\u0000n\u0000t\u0000 \u0000o\u0000f\u0000 \u0000h\u0000o\u0000s\u0000t\u0000 \u0000p\u0000h\u0000y\u0000s\u0000i\u0000c\u0000a\u0000l\u0000 \u0000m\u0000e\u0000m\u0000o\u0000r\u0000y\u0000 \u0000c\u0000o\u0000n\u0000s\u0000u\u0000m\u0000e\u0000d\u0000 \u0000b\u0000y\u0000 \u0000a\u0000 \u0000v\u0000i\u0000r\u0000t\u0000u\u0000a\u0000l\u0000 \u0000m\u0000a\u0000c\u0000h\u0000i\u0000n\u0000e\u0000", "message"=>"\u00009\u00009\u00004\u00009\u00008\u00009\u00001\u00000\u0000,\u00002\u00000\u00001\u00007\u0000-\u00000\u00002\u0000-\u00001\u00001\u0000 \u00001\u00001\u0000:\u00004\u00000\u0000:\u00000\u00000\u0000Z\u0000,\u0000A\u0000m\u0000o\u0000u\u0000n\u0000t\u0000 \u0000o\u0000f\u0000 \u0000h\u0000o\u0000s\u0000t\u0000 \u0000p\u0000h\u0000y\u0000s\u0000i\u0000c\u0000a\u0000l\u0000 \u0000m\u0000e\u0000m\u0000o\u0000r\u0000y\u0000 \u0000c\u0000o\u0000n\u0000s\u0000u\u0000m\u0000e\u0000d\u0000 \u0000b\u0000y\u0000 \u0000a\u0000 \u0000v\u0000i\u0000r\u0000t\u0000u\u0000a\u0000l\u0000 \u0000m\u0000a\u0000c\u0000h\u0000i\u0000n\u0000e\u0000,\u0000 \u0000h\u0000o\u0000s\u0000t\u0000,\u0000 \u0000o\u0000r\u0000 \u0000c\u0000l\u0000u\u0000s\u0000t\u0000e\u0000r\u0000,\u0000I\u0000D\u0000C\u0000_\u0000D\u0000M\u0000Z\u00000\u00001\u0000,\u0000C\u0000l\u0000u\u0000s\u0000t\u0000e\u0000r\u0000C\u0000o\u0000m\u0000p\u0000u\u0000t\u0000e\u0000R\u0000e\u0000s\u0000o\u0000u\u0000r\u0000c\u0000e\u0000-\u0000d\u0000o\u0000m\u0000a\u0000i\u0000n\u0000-\u0000c\u00002\u00007\u00005\u0000,\u00003\u00000\u00000\u0000,\u0000m\u0000e\u0000m\u0000.\u0000c\u0000o\u0000n\u0000s\u0000u\u0000m\u0000e\u0000d\u0000.\u0000a\u0000v\u0000e\u0000r\u0000a\u0000g\u0000e\u0000,\u0000K\u0000B\u0000\r\u0000", "Unit"=>"\u00003\u00000\u00000\u0000", "Timestamp"=>"\u00002\u00000\u00001\u00007\u0000-\u00000\u00002\u0000-\u00001\u00001\u0000 \u00001\u00001\u0000:\u00004\u00000\u0000:\u00000\u00000\u0000Z\u0000", "tags"=>["dateparsefailure"], "EntityId"=>"\u0000 \u0000o\u0000r\u0000 \u0000c\u0000l\u0000u\u0000s\u0000t\u0000e\u0000r\u0000", "path"=>"/home/user/vmware-stats/2017-02-14-IDC_DMZ01-stats-300-interval.csv", "MetricId"=>"\u0000C\u0000l\u0000u\u0000s\u0000t\u0000e\u0000r\u0000C\u0000o\u0000m\u0000p\u0000u\u0000t\u0000e\u0000R\u0000e\u0000s\u0000o\u0000u\u0000r\u0000c\u0000e\u0000-\u0000d\u0000o\u0000m\u0000a\u0000i\u0000n\u0000-\u0000c\u00002\u00007\u00005\u0000", "@timestamp"=>2017-02-14T17:16:51.681Z, "@version"=>"1", "host"=>"ubuntu", "Value"=>"\u00009\u00009\u00004\u00009\u00008\u00009\u00001\u00000\u0000", "column9"=>"\u0000m\u0000e\u0000m\u0000.\u0000c\u0000o\u0000n\u0000s\u0000u\u0000m\u0000e\u0000d\u0000.\u0000a\u0000v\u0000e\u0000r\u0000a\u0000g\u0000e\u0000", "IntervalSecs"=>"\u0000I\u0000D\u0000C\u0000\u0000D\u0000M\u0000Z\u00000\u00001\u0000"}}

Thanks,
Craig

cthomps · February 14, 2017, 8:45pm

I have tried to alter the time date format to "2017-02-14 15:35" in the csv file and use "match => ["TimeStamp", "yyyy-MM-dd HH:mm"]" in the logstash config file. Still no luck. Still receiving "tags"=>["_dateparsefailure"] in the debug logs.

Help?

magnusbaeck · February 15, 2017, 7:01am

You main problem is that the file isn't UTF-8. Start by adjusting the character set option to match the actual character set of the file. Once that's done you can return to the date filter if needed.

cthomps · February 15, 2017, 10:47am

Thanks Magnus. Yea, I figured this out late last night...

I'm using powershell (PowerCLI) to extract statistical data from VMware with the idea of using elasticsearch to analyze and build graphs, trends, etc. Turns out MS by default writes a weird file encoding called BOM = Byte-Order Mark. So even though the docs for Out-File and for Export-CSV say they default to UTF8 or ASCII, its not the same UTF8 as the rest of the world.

So to save the next guy/gal a few days of his/her life, here's how I resolved it. After getting the stats I was interested in, exported the data specifying the encoding style.

$stats | select value,@{n="TimeStamp";e={$.timestamp.ToString("yyyy-MM-dd HH:mm")}},description,entity,entityid,intervalsecs,metricid,unit | convertto-csv -NoTypeInformation -Delimiter "," | % {$ -replace '"',''} | Out-File -encoding ASCII $filepath

As soon as the encoding was cleaned up, data was imported easily.

Thank you,
Craig

guyboertje · February 15, 2017, 12:43pm

Nice one Magnus.
It looks like UTF16 to me.

system · March 15, 2017, 12:43pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
CSV Timstamp issues Logstash	20	3134	July 6, 2017
Csv and date filter Logstash	7	7128	July 6, 2017
Convert one filed from CSV column header to time stamp when loading csv to elasticsearch Logstash	3	1351	December 26, 2016
Import csv file to ES but date isn't correct Logstash	2	322	March 16, 2019
Logstash : Time Field Logstash	10	2533	February 5, 2018

Logstash csv - date and file format

Related topics