Errors when processing (mutating) date values

We're attempting to import historical data in CSV format via Logstash, but it's choking on the date values.

So far, we have the following config:

input {
	stdin {}
	file {
		path => ["/Applications/logstash-6.2.4/data/data.csv"]
		start_position => "beginning"
	}
}
filter {
	csv {
		separator => ","
		columns => [
			"keen.timestamp",
			"keen.created_at",
			"keen.id",
			"keen.location.coordinates",
			"bytes_received",
			"city",
			"client_id",
			"country",
			"day_of_month",
			"day_of_week",
			"end_time",
			"error.bytesParsed",
			"error.code",
			"error.errno",
			"error.expose",
			"error.message",
			"error.path",
			"error.rawPacket.data",
			"error.rawPacket.type",
			"error.status",
			"error.statusCode",
			"error.syscall",
			"file_name",
			"hour_of_day",
			"ip",
			"keen_ip_geo.city",
			"keen_ip_geo.continent",
			"keen_ip_geo.coordinates",
			"keen_ip_geo.country",
			"keen_ip_geo.country_code",
			"keen_ip_geo.postal_code",
			"keen_ip_geo.province",
			"month",
			"path",
			"post_id",
			"referrer",
			"referrer_info.medium",
			"referrer_info.source",
			"referrer_info.term",
			"region",
			"request_duration",
			"request_durations",
			"server_ip",
			"site_id",
			"start_time",
			"status","total_connections",
			"total_request_duration",
			"url",
			"user_agent",
			"user_agent_parsed.browser.family",
			"user_agent_parsed.browser.major",
			"user_agent_parsed.browser.minor",
			"user_agent_parsed.browser.patch",
			"user_agent_parsed.device.family",
			"user_agent_parsed.os.family",
			"user_agent_parsed.os.major",
			"user_agent_parsed.os.minor",
			"user_agent_parsed.os.patch",
			"user_agent_parsed.os.patch_minor",
			"year"
		]
	}
	mutate { convert => [ "keen.timestamp", "date" ] }
	mutate { convert => [ "keen.created_at", "date" ] }
	mutate { convert => [ "start_time", "date" ] }
}
output {
	elasticsearch {
		hosts => ["localhost:9200"]
		index => "correct_downloads"
		document_type => "correct_downloads"
	}
	stdout {}
}

... but when we run this, we get the following errors:

[2018-09-05T09:21:45,908][ERROR][logstash.pipeline ] Error registering plugin {:pipeline_id=>"main", :plugin=>"#<LogStash::FilterDelegator:0x180ad90d @metric_events_out=org.jruby.proxy.org.logstash.instrument.metrics.counter.LongCounter$Proxy2 - name: out value:0, @metric_events_in=org.jruby.proxy.org.logstash.instrument.metrics.counter.LongCounter$Proxy2 - name: in value:0, @metric_events_time=org.jruby.proxy.org.logstash.instrument.metrics.counter.LongCounter$Proxy2 - name: duration_in_millis value:0, @id="5de7e9f1cd7ddffd32df93a251f985d4a55ef70e3c82d51e4235e61861b5ad57", @klass=LogStash::Filters::Mutate, @metric_events=#<LogStash::Instrument::NamespacedMetric:0x266343bc @metric=#<LogStash::Instrument::Metric:0x52296872 @collector=#<LogStash::Instrument::Collector:0x160c4610 @agent=nil, @metric_store=#<LogStash::Instrument::MetricStore:0x5800276d @store=#<Concurrent::map:0x00000000000fb0 entries=3 default_proc=nil>, @structured_lookup_mutex=#Mutex:0x9799a00, @fast_lookup=#<Concurrent::map:0x00000000000fb4 entries=76 default_proc=nil>>>>, @namespace_name=[:stats, :pipelines, :main, :plugins, :filters, :"5de7e9f1cd7ddffd32df93a251f985d4a55ef70e3c82d51e4235e61861b5ad57", :events]>, @filter=<LogStash::Filters::Mutate convert=>{"keen.timestamp"=>"date"}, id=>"5de7e9f1cd7ddffd32df93a251f985d4a55ef70e3c82d51e4235e61861b5ad57", enable_metric=>true, periodic_flush=>false>>", :error=>"translation missing: en.logstash.agent.configuration.invalid_plugin_register", :thread=>"#<Thread:0x23a1cd1d run>"}
[2018-09-05T09:21:45,924][ERROR][logstash.pipeline ] Pipeline aborted due to error {:pipeline_id=>"main", :exception=>#<LogStash::ConfigurationError: translation missing: en.logstash.agent.configuration.invalid_plugin_register>, :backtrace=>["/Applications/logstash-6.2.4/vendor/bundle/jruby/2.3.0/gems/logstash-filter-mutate-3.3.1/lib/logstash/filters/mutate.rb:219:in block in register'", "org/jruby/RubyHash.java:1343:ineach'", "/Applications/logstash-6.2.4/vendor/bundle/jruby/2.3.0/gems/logstash-filter-mutate-3.3.1/lib/logstash/filters/mutate.rb:217:in register'", "/Applications/logstash-6.2.4/logstash-core/lib/logstash/pipeline.rb:342:inregister_plugin'", "/Applications/logstash-6.2.4/logstash-core/lib/logstash/pipeline.rb:353:in block in register_plugins'", "org/jruby/RubyArray.java:1734:ineach'", "/Applications/logstash-6.2.4/logstash-core/lib/logstash/pipeline.rb:353:in register_plugins'", "/Applications/logstash-6.2.4/logstash-core/lib/logstash/pipeline.rb:731:inmaybe_setup_out_plugins'", "/Applications/logstash-6.2.4/logstash-core/lib/logstash/pipeline.rb:363:in start_workers'", "/Applications/logstash-6.2.4/logstash-core/lib/logstash/pipeline.rb:290:inrun'", "/Applications/logstash-6.2.4/logstash-core/lib/logstash/pipeline.rb:250:in `block in start'"], :thread=>"#<Thread:0x23a1cd1d run>"}
[2018-09-05T09:21:45,959][ERROR][logstash.agent ] Failed to execute action {:id=>:main, :action_type=>LogStash::ConvergeResult::FailedAction, :message=>"Could not execute action: LogStash::PipelineAction::Create/pipeline_id:main, action_result: false", :backtrace=>nil}

... and when I remove the "mutate" options, we get another error:

2018-09-05T08:51:57,497][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"correct_downloads", :_type=>"correct_downloads", :_routing=>nil}, #], :response=>{"index"=>{"_index"=>"correct_downloads", "_type"=>"correct_downloads", "_id"=>"1Uq3qGUBxmfELKVAaRe2", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse [keen.timestamp]", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"Invalid format: "keen.timestamp""}}}}}

I'm assuming there are additional processing options for dates?

You can not mutate to date. You instead need to use the date filter plugin to process this.

1 Like

Hi @Christian_Dahlqvist, and thank you for the prompt response!

I've made a few changes, adding date filters:

date {
	match => [ "keen.timestamp", "yyyy-MM-dd'T'HH:mm:ss.SSSZ" ]
}
date {
	match => [ "keen.created_at", "yyyy-MM-dd'T'HH:mm:ss.SSSZ" ]
}
date {
	match => [ "start_time", "yyyy-MM-dd'T'HH:mm:ss.SSSZ" ]
}
date {
	match => [ "end_time", "yyyy-MM-dd'T'HH:mm:ss.SSSZ" ]
}
date {
	match => [ "timestamp", "yyyy-MM-dd'T'HH:mm:ss.SSSZ" ]
}

... however, it's choking on the "start_time", in spite of the fact that the date values in that column are identical to the others.

Any ideas?

The date filter by default populates the @timestamp field, so you probably want to specify a target field. In order to help analyse this type of problems it helps if you can show some sample data and how that is output to the stdout plugin using a rubydebug codec.

Hi! So the debug would be: stdout { rubydebug codec } correct?

Also, I made the changes you recommended:

date {
	match => [ "keen.timestamp", "yyyy-MM-dd'T'HH:mm:ss.SSSZ" ]
	target => "keen.timestamp"
}
date {
	match => [ "keen.created_at", "yyyy-MM-dd'T'HH:mm:ss.SSSZ" ]
	target => "keen.created_at"
}
date {
	match => [ "start_time", "yyyy-MM-dd'T'HH:mm:ss.SSSZ" ]
	target => "start_time"
}
date {
	match => [ "end_time", "yyyy-MM-dd'T'HH:mm:ss.SSSZ" ]
	target => "end_time"
}
date {
	match => [ "timestamp", "yyyy-MM-dd'T'HH:mm:ss.SSSZ" ]
	target => "timestamp"
}

... but now it's choking on the "end_time", and — as before — these values are of an identical format.

I've asked the client if I'm able to share some sample data, as you asked.

Also, have a look at the Logstash log. When the date filter fails it'll log additional information there.

Here's a sample of the data:

"keen.timestamp","keen.created_at","keen.id","keen.location.coordinates","bytes_received","city","client_id","country","day_of_month","day_of_week","end_time","error.bytesParsed","error.code","error.errno","error.expose","error.message","error.path","error.rawPacket.data","error.rawPacket.type","error.status","error.statusCode","error.syscall","file_name","hour_of_day","ip","keen_ip_geo.city","keen_ip_geo.continent","keen_ip_geo.coordinates","keen_ip_geo.country","keen_ip_geo.country_code","keen_ip_geo.postal_code","keen_ip_geo.province","month","path","post_id","referrer","referrer_info.medium","referrer_info.source","referrer_info.term","region","request_duration","request_durations","server_ip","site_id","start_time","status","total_connections","total_request_duration","url","user_agent","user_agent_parsed.browser.family","user_agent_parsed.browser.major","user_agent_parsed.browser.minor","user_agent_parsed.browser.patch","user_agent_parsed.device.family","user_agent_parsed.os.family","user_agent_parsed.os.major","user_agent_parsed.os.minor","user_agent_parsed.os.patch","user_agent_parsed.os.patch_minor","year"
"2018-09-04T11:23:11.434Z","2018-09-04T11:23:11.434Z","5b8e6b1f181c7f000107f2a5","[52.2, 0.1167]","66468","Cambridge","469","GB","4","2","2018-09-04T10:23:10.722Z","","EPIPE","EPIPE","","","","","","","","write","48--Alan-Hill.mp3","6","90.253.84.156","Cambridge","Europe","[0.1167, 52.2]","United Kingdom","GB","CB5","Cambridgeshire","8","/469/518/3031/48--Alan-Hill.mp3","3031","","","","","C3","","[646]","167.114.101.155","518","2018-09-04T10:23:09.894Z","aborted","1","646","https://domain.com/469/518/3031/48--Alan-Hill.mp3","AppleCoreMedia/1.0.0.15G77 (iPhone; U; CPU OS 11_4_1 like Mac OS X; en_gb)","Mobile Safari UI/WKWebView","","","","iPhone","iOS","11","4","1","","2018"
"2018-09-04T11:23:12.961Z","2018-09-04T11:23:12.961Z","5b8e6b20f8eed800017e061e","[-37.8228, 144.9643]","546","Southbank","1424","AU","4","2","2018-09-04T10:23:12.314Z","","","","","","","","","","","","Anders-Winroth-Conversion.mp3","6","58.168.204.234","Southbank","Oceania","[144.9522, -37.8253]","Australia","AU","3006","Victoria","8","/1424/1181/1890/Anders-Winroth-Conversion.mp3","1890","","","","","07","","[292]","167.114.101.155","1181","2018-09-04T10:23:12.313Z","complete","1","292","https://domain.com/1424/1181/1890/Anders-Winroth-Conversion.mp3","AppleCoreMedia/1.0.0.13G36 (iPod touch; U; CPU OS 9_3_5 like Mac OS X; en_au)","Mobile Safari UI/WKWebView","","","","iPod","iOS","9","3","5","","2018"
"2018-09-04T11:23:15.565Z","2018-09-04T11:23:15.565Z","5b8e6b231627290001494409","[28.762, -81.63]","17871843","Mount Dora","1297","US","4","2","2018-09-04T10:23:14.940Z","","EPIPE","EPIPE","","","","","","","","write","TCP181-Ron-Beghetto.mp3","6","142.196.218.173","Mt. Dora","North America","[-81.63, 28.762]","United States","US","32757","Florida","8","/1297/1052/9740/TCP181-Ron-Beghetto.mp3","9740","","","","","FL","","[549, 1375]","167.114.101.155","1052","2018-09-04T10:22:54.723Z","complete","2","1924","https://domain.com/1297/1052/9740/TCP181-Ron-Beghetto.mp3","AppleCoreMedia/1.0.0.15G77 (iPhone; U; CPU OS 11_4_1 like Mac OS X; en_us)","Mobile Safari UI/WKWebView","","","","iPhone","iOS","11","4","1","","2018"
"2018-09-04T11:23:17.205Z","2018-09-04T11:23:17.205Z","5b8e6b25a27a000001c41a7a","[44.0001, -79.4663]","62521705","Aurora","614","CA","4","2","2018-09-04T10:23:16.601Z","","","","","","","","","","","","Back-at-it-again.mp3","6","173.35.155.102","Aurora","North America","[-79.4663, 44.0001]","Canada","CA","L4G","Ontario","8","/614/722/1799/Back-at-it-again.mp3","1799","","","","","ON","","[332, 885]","167.114.101.155","722","2018-09-04T10:18:16.762Z","complete","2","1217","https://domain.com/614/722/1799/Back-at-it-again.mp3","AppleCoreMedia/1.0.0.15D100 (iPhone; U; CPU OS 11_2_6 like Mac OS X; en_ca)","Mobile Safari UI/WKWebView","","","","iPhone","iOS","11","2","6","","2018"
"2018-09-04T11:23:17.902Z","2018-09-04T11:23:17.902Z","5b8e6b252454a50001beab8b","[38.0049, -121.8058]","531","Antioch","370","US","4","2","2018-09-04T10:23:17.298Z","","","","","","","","","","","","252-Wasser.mp3","6","73.71.185.122","Antioch","North America","[-121.8058, 38.0049]","United States","US","94509","California","8","/370/401/8762/252-Wasser.mp3","8762","","","","","CA","","[267]","167.114.101.155","401","2018-09-04T10:23:17.297Z","complete","1","267","https://domain.com/370/401/8762/252-Wasser.mp3","AppleCoreMedia/1.0.0.15G77 (iPhone; U; CPU OS 11_4_1 like Mac OS X; en_us)","Mobile Safari UI/WKWebView","","","","iPhone","iOS","11","4","1","","2018"
"2018-09-04T11:23:18.929Z","2018-09-04T11:23:18.929Z","5b8e6b26e5632000017b52a5","[38.0049, -121.8058]","197151","Antioch","370","US","4","2","2018-09-04T10:23:18.292Z","","EPIPE","EPIPE","","","","","","","","write","252-Wasser.mp3","6","73.71.185.122","Antioch","North America","[-121.8058, 38.0049]","United States","US","94509","California","8","/370/401/8762/252-Wasser.mp3","8762","","","","","CA","","[267, 552]","167.114.101.155","401","2018-09-04T10:23:17.297Z","aborted","2","819","https://domain.com/370/401/8762/252-Wasser.mp3","AppleCoreMedia/1.0.0.15G77 (iPhone; U; CPU OS 11_4_1 like Mac OS X; en_us)","Mobile Safari UI/WKWebView","","","","iPhone","iOS","11","4","1","","2018"

I hope that helps!

Hi! Thanks to help here, we've got passed the date columns, but now it's:

[2018-09-05T15:22:35,588][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"correct_downloads_en_v1", :_type=>"correct_downloads", :_routing=>nil}, #<LogStash::Event:0x67a933d5>], :response=>{"index"=>{"_index"=>"correct_downloads_en_v1", "_type"=>"correct_downloads", "_id"=>"l0odqmUBxmfELKVADRrX", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"Cannot add a value for field [error.errorPath] since one of the intermediate objects is mapped as a nested object: [error]"}}}}

We are using nested objects in the index for Elastic.

I tried:

mutate {
	rename => {
		"error.bytesParsed" => "error.errorBytesParsed"
		"error.code" => "error.errorCode"
		"error.errno" => "error.errorErrNo"
		"error.expose" => "error.errorExpose"
		"error.message" => "error.errorMessage"
		"error.path" => "error.errorPath"
		"error.status" => "error.errorStatus"
		"error.statusCode" => "error.errorStatusCode"
	}
}

... which appears to be the cause of the error.

It seems the error field has already been mapped as an object and then you can't have a field named error.errorPath. Note that the latter notation does not denote an error field that has a errorPath subfield but a field with a literal period in it. If you want to have subfields then [error][errorPath] is the correct syntax.

1 Like

Hi, I managed to figure that out, but there are a number of other columns with similar dot notation that have the correct names, but are still causing errors:

[2018-09-05T15:39:06,868][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"correct_downloads_en_v1", :_type=>"correct_downloads", :_routing=>nil}, #<LogStash::Event:0x572a4e71>], :response=>{"index"=>{"_index"=>"correct_downloads_en_v1", "_type"=>"correct_downloads", "_id"=>"MUosqmUBxmfELKVALhtR", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"Cannot add a value for field [referrer_info.source] since one of the intermediate objects is mapped as a nested object: [referrer_info]"}}}}

We don't need to rename them, but instead pass them through as is.

Perhaps you just have old mappings from the time when you were trying things out. An easy way of fixing that is simply deleting the existing index and starting fresh.

Otherwise we'll want to see what the documents you want to store look like (replace the elasticsearch output with a stdout output using either the json_lines or the rubydebug codec) and the index's mappings.

Hi @magnusbaeck, here's the relevant part of the mapping:

"referrer_info":{  
	"type":"nested",
	"properties":{  
		"medium":{  
			"type":"text",
			"analyzer":"english",
			"fields":{  
				"std":{  
					"type":"text",
					"analyzer":"statistic_en_analyzer",
					"fields":{  
						"std":{  
							"type":"text",
							"analyzer":"standard"
						}
					}
				}
			}
		},
		"source":{  
			"type":"text",
			"analyzer":"english",
			"fields":{  
				"std":{  
					"type":"text",
					"analyzer":"statistic_en_analyzer",
					"fields":{  
						"std":{  
							"type":"text",
							"analyzer":"standard"
						}
					}
				}
			}
		},
		"term":{  
			"type":"text",
			"analyzer":"english",
			"fields":{  
				"std":{  
					"type":"text",
					"analyzer":"statistic_en_analyzer",
					"fields":{  
						"std":{  
							"type":"text",
							"analyzer":"standard"
						}
					}
				}
			}
		}
	}
}

... and the referrer_info object is valid; we're pushing data into it via the application.

Also, I'm getting errors with the "error" object again, in spite of the fact I've not changed anything, and it's the same data.

I'm afraid I don't know enough bout nested fields to help out here.

Hi @magnusbaeck, thank you for the help.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.