Formatting dates in logstash.conf

Hi,

I am trying to import historical data into elastic and having issues getting the dates to be accepted so I can use them as @timestamp for histograms and the like.

First I am creating the index by doing a PUT request with

{
"mappings":{
"redacted":{
"_all":{"enabled":true},
"properties":{
"Apptimestamp":{"type": "date", "index": "not_analyzed"},
"AppUser%":{"type": "float", "index": "not_analyzed"},
"AppSys%":{"type": "float", "index": "not_analyzed"},
"AppWait%":{"type": "float", "index": "not_analyzed"},
"AppIdle%":{"type": "float", "index": "not_analyzed"},
"AppSteal%":{"type": "float", "index": "not_analyzed"},
"AppBusy":{"type": "float", "index": "not_analyzed"},
"AppCPUs":{"type": "integer", "index": "not_analyzed"}
}
}
}
}

Note: the reason I am creating the index first is to ensure that the fields are in the correct format, if I try using csv{convert =>{"name" => "float"}} in the logstash config it does not always work and then sets my floats to strings and I can't use the Min/max, Average, etc. visualisations on them.

I am then trying to push .csv format data that looks like this snippet

CPU Total ip-10-157-38-108,User%,Sys%,Wait%,Idle%,Steal%,Busy,CPUs
2016-08-03 21:33:20,0.3,1.5,2.4,95.8,0.0,,8
2016-08-03 21:33:25,0.1,0.4,0.0,99.5,0.0,,8
2016-08-03 21:33:30,0.2,0.0,0.0,99.8,0.0,,8
2016-08-03 21:33:35,0.0,0.0,0.0,100.0,0.0,,8

into that index using a logstash configuration that looks like

input {
	stdin{}
	}
filter {
	csv{
		columns => ["Apptimestamp","AppUser%","AppSys%","AppWait%","AppIdle%","AppSteal%","AppBusy","AppCPUs"]
		convert => {"Apptimestamp" => "date"}
	}
	date{
		match => ["Apptimestamp", "yyyy-MM-dd HH:mm:ss"]
	}
}
output {
	#stdout { codec => rubydebug }
	elasticsearch {
		hosts => "localhost:9200"
        #port => "443"       # set to 80 if you want to use HTTP and not HTTPS
        ssl => "false"       # set to false if you don't want to use SSL/HTTPS
        index => "redacted"
        manage_template => false
	}
}

But when I run this from the command line I get the message

response=>{"create"=>{"_index"=>"redacted", "_t
ype"=>"logs", "_id"=>"AVZaQfKtXZI4D2I1N-uT", "status"=>400, "error"=>{"type"=>"m
apper_parsing_exception", "reason"=>"failed to parse [Apptimestamp]", "caused_by
"=>{"type"=>"illegal_argument_exception", "reason"=>"Invalid format: \"2016-08-0
3 21:54:01\" is malformed at \" 21:54:01\""}}}}, :level=>:warn}←[0m
Pipeline main has been shutdown

Can anyone spot what I'm doing wrong or what config options I've missed? AM I just taking entirely the wrong approach?

I guess if You define your date filed in elasticsearch mapping template it will be indexed as date.
E.g.:
"Apptimestamp" : { "type" : "date", "format" : "yyyy-MM-dd HH:mm:ss" }

That didn't work unfortunately, I get the same error and none of the log content is present in kibana.

Have you removed your old template from elasticsearch? Because if not, this won't take effect on your existing index. See the Note on this doc page https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-templates.html#indices-templates

Basic steps to update template are:

  1. Remove old index template from elasticsearch
  2. Upload new template
  3. And maybe have to update fields in kibana (but that doesn't have impact on index, only on displaying)

Anyway, until you define date format in index itself, the index is not aware of it and will throw an error.

I deleted my old index (it's on a test system) and recreated it using the Create Index mapping api. Looking again at the documentation on that there does not seem to be a way to define the date format.

https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html

Should I use templates instead before creating my index and mapping fields into it?

Indices may map on templates that already exist in Elasticsearch. To delete index may not be sufficient to reset your mappings. You can view them separately from indices with command
curl -XGET "yourelastic:9200/_template?pretty=true"
or
curl -XGET "yourelastic:9200/_template/specific_template?pretty=true" to view some single template.

If they are exist your newly created index will match this template. Make sure that your current tempates in ES are the same as you expect.

Yes.

In fact, there is a way, and furthermore you can define alternative date formats in one field

Instead of using the csv filter's convert option I'd rely on the date filter. You're already using it to parse the Apptimestamp field but since there's no target option you're storing the parsed result in the @timestamp field. Either change that or add an extra date filter. With the Apptimestamp field in ISO8601 format Elasticsearch should parse it correctly out of the box.

I managed to fix it in the end, thank to you two for the help.

I added a target for Apptimestamp and when that ran it reported an error trying to parse 'CPU Total ip-10-157-38-108' in the format YYYY-MM-DD HH:mm:ss so I then added an if[]==""{drop{}} to remove the header line.

It's now parsing correctly with the three components looking like:

The raw file content

CPU Total ip-10-157-38-108,User%,Sys%,Wait%,Idle%,Steal%,Busy,CPUs
2016-08-03 21:33:20,0.3,1.5,2.4,95.8,0.0,,8
2016-08-03 21:33:25,0.1,0.4,0.0,99.5,0.0,,8
2016-08-03 21:33:30,0.2,0.0,0.0,99.8,0.0,,8
2016-08-03 21:33:35,0.0,0.0,0.0,100.0,0.0,,8

The PUT request to create the index

{
"mappings":{
"redacted":{
"_all":{"enabled":true},
"properties":{
"Apptimestamp":{"type": "date", "index": "not_analyzed"},
"AppUser%":{"type": "float", "index": "not_analyzed"},
"AppSys%":{"type": "float", "index": "not_analyzed"},
"AppWait%":{"type": "float", "index": "not_analyzed"},
"AppIdle%":{"type": "float", "index": "not_analyzed"},
"AppSteal%":{"type": "float", "index": "not_analyzed"},
"AppBusy":{"type": "float", "index": "not_analyzed"},
"AppCPUs":{"type": "integer", "index": "not_analyzed"}
}
}
}
}

the logstash.conf

input {
	stdin{}
	}
filter {
	csv{
		columns => ["Apptimestamp","AppUser%","AppSys%","AppWait%","AppIdle%","AppSteal%","AppBusy","AppCPUs"]
	}
	if ([AppUser%] == "User%"){
		drop{ }
	}
	date{
		match => ["Apptimestamp", "YYYY-MM-DD HH:mm:ss"]
		target => "Apptimestamp"
	}
}
output {
	#stdout { codec => rubydebug }
	elasticsearch {
		hosts => "localhost:9200"
        #port => "443"       # set to 80 if you want to use HTTP and not HTTPS
        ssl => "false"       # set to false if you don't want to use SSL/HTTPS
        index => "[redacted]"
        manage_template => false
	}
}

Note that the drop uses the second column and not the first because the ip address of the string will vary as different machines output their logs.

Thank you!