Logstash configuration returns different results

Hi guys.

Below my logstash configuration :

input {
  file {
    path => "/tmp/file.csv"
    type => "myindex"
    start_position => "beginning"
  }
}

filter {
  if ([type] == "myindex") {
    csv {
      skip_header => "true"
      separator => ","
      columns => ["time","f1","f2","f3","f4"]
      remove_field => ["host"]
      remove_field => ["message"]
      remove_field => ["path"]
    }
    mutate {convert => ["f1", "float"] }
    mutate {convert => ["f2", "integer"] }
    mutate {convert => ["f3", "integer"] }
    mutate { add_field => { "eventdate" => "%{+YYYY.MM.dd} %{time}" } }
    date {
      locale => "en"
      match => ["eventdate", "YYYY.MM.dd HH:mm:ss.SSS"]
      target => "eventdate"
    }
  }
}

output   {
  if [type] == "myindex" {
    elasticsearch {
      hosts => "myip:myport"
      index => "myindex_%{+yyyy}%{+MM}"
      document_id => "%{+yyyy}%{+MM}%{+dd}%{time}%{f2}"
    }
  }
}

Input{ } & Outut { } are classic
In my filter { }, I defined my separator and header, I get rid of the fields created by logstash (host, message, path), I converted few fields into the right formats and I created eventdate based on my field ${time} (looking like HH:mm:ss.SSS) by adding the current date before

I've got a very weird behavior with this configuration, it seems like something is going wrong but I can't figure out why/what. I actually got the below errors sometimes :

[2019-07-15T08:13:46,234][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>"20190715D1.4671", :_index=>"myindex_201907", :_type=>"_doc", :routing=>nil}, #<LogStash::Event:0x22a6398>], :response=>{"index"=>{"_index"=>"myindex_201907", "_type"=>"_doc", "_id"=>"20190715D1.4671", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [eventdate] of type [date] in document with id '20190715D1.vi '", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"failed to parse date field [2019.07.15 D] with format [strict_date_optional_time||epoch_millis]", "caused_by"=>{"type"=>"date_time_parse_exception", "reason"=>"Failed to parse with all enclosed parsers"}}}}}}

[2019-07-15T08:41:39,853][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>"2019071548:56.000val1", :_index=>"myindex_201907", :_type=>"_doc", :routing=>nil}, #<LogStash::Event:0x985bc66>], :response=>{"index"=>{"_index"=>"myindex_201907", "_type"=>"_doc", "_id"=>"2019071548:56.000val1", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [eventdate] of type [date] in document with id '2019071548:56.000val1'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"failed to parse date field [2019.07.15 48:56.000] with format [strict_date_optional_time||epoch_millis]", "caused_by"=>{"type"=>"date_time_parse_exception", "reason"=>"Failed to parse with all enclosed parsers"}}}}}}

[2019-07-15T08:40:55,321][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>"20190715%{time}1", :_index=>"myindex_201907", :_type=>"_doc", :routing=>nil}, #<LogStash::Event:0x7c4e553>], :response=>{"index"=>{"_index"=>"myindex_201907", "_type"=>"_doc", "_id"=>"20190715%{time}1", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [eventdate] of type [date] in document with id '20190715%{time}1'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"failed to parse date field [2019.07.15 %{time}] with format [strict_date_optional_time||epoch_millis]", "caused_by"=>{"type"=>"date_time_parse_exception", "reason"=>"Failed to parse with all enclosed parsers"}}}}}}

[2019-07-15T08:46:25,846][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>"20190715val2val3", :_index=>"myindex_201907", :_type=>"_doc", :routing=>nil}, #<LogStash::Event:0x22681dc7>], :response=>{"index"=>{"_index"=>"myindex_201907", "_type"=>"_doc", "_id"=>"20190715val2val3", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [eventdate] of type [date] in document with id '20190715val2val3'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"failed to parse date field [2019.07.15 val2] with format [strict_date_optional_time||epoch_millis]", "caused_by"=>{"type"=>"date_time_parse_exception", "reason"=>"Failed to parse with all enclosed parsers"}}}}}}

It seems like sometimes logstash isn't able to correctly parse my lines and I checked the format of my files couple of times, they all look good. The worse thing is with the same config and file, I won't reproduce the same error, it only happens sometimes and randomly. Don't really know what to do here

Thanks in advance
Guillaume :slight_smile:

I have to disagree. The [eventdate] field has values such as

2019.07.15 D
2019.07.15 48:56.000
20190715%{time}1
2019.07.15 val2

I do not think it is reasonable to expect elasticsearch to be able to parse any of those as dates.

Hi @Badger, Thanks for answering

I agree, there is a reason to explain why such values got into my eventdate field but actually, I've just got another case and the error is :

[2019-07-15T15:35:11,366][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>"2019071522:28.000val1", :_index=>"myindex", :_type=>"_doc", :routing=>nil}, #<LogStash::Event:0x4fcc371d>], :response=>{"index"=>{"_index"=>"myinex", "_type"=>"_doc", "_id"=>"2019071522:28.000val1", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [eventdate] of type [date] in document with id '2019071522:28.000val1'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"failed to parse date field [2019.07.15 22:28.000] with format [strict_date_optional_time||epoch_millis]", "caused_by"=>{"type"=>"date_time_parse_exception", "reason"=>"Failed to parse with all enclosed parsers"}}}}}}

So I looked for 22:28.000val1 in my file (because 20190715 is generated) and I have only one row :
14:22:28.000,val1,.98402,5000,1
Can't understand why the 14: is skip in this case ... (val1 contains '/' by the way)

As already told, I re-run logstash on the same file and ... errors again but never the same line (same conf/same file)

Thanks again
Guillaume

If eventdate has that format I would expect the events to have a _dateparsefailure tag. I would route events with that tag to a file output. I would also comment out the line that removes the [message] field. Then see if you can infer from the events what is going wrong.

Thanks ! Lovely ideas, let me try this out and get back here later

Just got one :
{"eventdate":"2019.07.15 07.865","tags":["_dateparsefailure"],"sym":"25000","message":"07.865,25000,1","rate":1.0,"@version":"1","@timestamp":"2019-07-15T16:56:00.341Z","type":"myindex","time":"07.865"}
(all fields are wrong here)

Then, I grep 07.865,25000,1 myfile and got only one line back :
01:40:40.000,val1,107.865,25000,1 ... how can it be possible ? ...
I set ":set list" in my vi for the special characters but once again all look good, all lines ended up by '$'
Any further idea ?

Thanks

Sorry, I cannot explain that. If you run with '--log.level debug' then the file input will log each line as it reads it from the file. I don't know if that will be useful, but I would try that next.

Hi,
Sorry for the late reply,
Didn't get any better so far ..

But actually, the thing weird is, in my last file I have data from 12:00:00.500 to 12:59:59.500and I've got this error message :
[2019-07-16T13:10:17,826][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>"20190716%{time}val1", :_index=>"myindex", :_type=>"_doc", :routing=>nil}, #<LogStash::Event:0x608e462d>], :response=>{"index"=>{"_index"=>"myindex", "_type"=>"_doc", "_id"=>"20190716%{time}val1", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [eventdate] of type [date] in document with id '20190716%{time}val1'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"failed to parse date field [2019.07.16 %{time}] with format [strict_date_optional_time||epoch_millis]", "caused_by"=>{"type"=>"date_time_parse_exception", "reason"=>"Failed to parse with all enclosed parsers"}}}}}}
(not that in this case the all ${time} hasn't been parsed - not only the hour as shown in my previous example)

In Kibana I have data only between 12:48:00.500 to 12:59:59.500 (it means I'm missing 45min of data, which should be much more than only 1 error but X errors for X lines missing right ?)

Thanks

I believe the output will unconditionally log a warning if it is unable to index an event, so yes, it should be 1 to 1.

Make me think that logstash is not reading my file since its beginning as it should be done (like stuck in the middle of a line or something, because the parse error message is always different...)
What I do so far is, I have a copied file from /x to /tmp, I modify this file and mv /tmp/previousfilename.csv /tmp/logstashfilename.csv. The mv command overwrite a previous file already proceed by logstash one hour ago
What do you think ? I tried the file_completed_action => "delete" but it doesn't delete my file either ...

Thanks

Are you really just reading one file or are you reading a set of files that is changing over time. I am wondering if you are seeing inode re-use. That's where you read a file and delete it, then create a new file with a different name but the same inode. logstash will start reading the file at an offset equal to the length of the previous file.

If you enable '--log.level trace' then filewatch will log messages about each file that it discovers and the associated sincedb entry.

No, no my logstash always reads the same filename :
input {
file {
path => "/tmp/myfile.csv"
type => "myindex"
start_position => "beginning"
}
}
but I overwrite the data in this file doing a mv myprevfile.csv /tmp/myfile.csv. So not sure if it's taken as a new file finally by logstash ...

I didn't set a sincedb, just saw I got this WARN when starting logstash :

[2019-07-16T10:57:42,286][INFO ][logstash.inputs.file ] No sincedb_path set, generating one based on the "path" setting {:sincedb_path=>"...../logstash/plugins/inputs/file/.sincedb_db6f073d7b8fd2580bc00da4d5e1235a", :path=>["/tmp/myfile.csv"]}

What's the best way to make logstash detect a file as new and read the all content ?

Thanks
[edit]
In this sincedb file, I believe the last column has to contain the file name but in mine, sometimes it's just empty...

  • I actually don't even need this file as far as I want to proceed my file and delete it ...

That effectively deletes /tmp/myfile.csv, freeing its inode for re-use. If you do that mv twice you could hit this problem.

Try specifying 'sincedb_path => "/dev/null"', which prevents logstash persisting the in-memory sincedb to disk, and restarting logstash. I suspect you will never see this issue for the first file it reads, but you will see it for the third.

That's what happened this morning !

I'll try to switch to something easier with real new files like :
input {
file {
path => "/var/log/message*"
sincedb_path => "/dev/null"
and I ll let you know

Thanks so much

Alright now
Working as expected
I set sincedb_path as you said + I now create a real new file rather than renaming it to match with my logstash configuration.
As said above, I'm using also a pattern in path
Seems to work perfectly now

Thanks a lot for your time @Badger :tada:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.