Uploading csv file: Failed to parse date field dd/MM/yyyy HH:mm

Hi,
I'm uploading a csv file (; as separator). I identified one of the columns as Machine Time. The format on excel is "customized" as seen in the image below:

To parse the date as a date type on Elastic I used the date filter:

     date {
                    match => ["Machine Time", "dd/MM/yyyy HH:mm", "ISO8601", "dd/MM/yyyy HH:mm:ss"]
                    target => "Machine Time"
            }

The strange thing is that on for example 6000 lines, it loads 5995 correctly and the missing 5 do not, even if the format is the same for all of them.

On the logstash logs I see the following error: Preview of field's value: '28/03/2021 02:05'
This is one of the 5 dates it fails to upload.

Can anyone help me? What can i do to fix?
Thanks.

Marco

Looks like there's a space after the day. So to catch that then add a space in your match also.

date {
 match => ["Machine Time", "dd/MM/yyyy HH:mm",  "dd /MM/yyyy HH:mm",  "ISO8601", "dd/MM/yyyy HH:mm:ss"]
 target => "Machine Time"
}

Hi aaron,
I don't know why I copied the string like this but there is no space. Now I edit the post.

immagine

Marco

The date alone parses correctly so I wouldn't think this has anything to do with the date. Is there more logic in your configuration that could be it?

Are you able to post your .conf?

input {
  file {
    path => "PATH/file_name.csv"
    start_position => "beginning"
    sincedb_path => PATH
  }
}

filter {
    csv { 
       columns => ["***", "Machine Time", "***", "***", ..., other 140 columns name]
       separator => ";"
       "***" => "integer" (this for each number column, almost 40 columns)
       date {
                match => ["Machine Time", "dd/MM/yyyy HH:mm", "ISO8601", "dd/MM/yyyy HH:mm:ss"]
                target => "Machine Time"
        }
}
output {
  stdout { codec => rubydebug }
  elasticsearch {
   hosts => ["localhost:9200"]
   index => "index_name"
   user => "***"
   password => "***"
}
}

Nothing looks out of place. Are you able to isolate the same 5 records each time?

Have you looked at the file with a text editor and not within Excel to verify no extra/special characters that could be causing it?

To make you understand the situation a little. I'm doing a test with a csv populated with data coming from a mysql database. Probably in the future I will connect directly to the database, but for now I want to use this csv and populate it manually. It is therefore a continuous flow of data ... I cannot afford to have this problem repeat itself again. If it happens with the first 6000 lines, it will probably happen with the new data. So the solution is not to isolate these 5 lines but to solve the problem, because otherwise it will repeat itself with the new data without my understanding the real cause.

Anyway, by block note i see that (so no extra caracters):

immagine

I'm stumped from what I am seeing. Are you able to share the CSV to see if I can replicate the results?

How do I share a csv file to you?

https://pastebin.com/ or https://gist.github.com/ would probably be the easiest.

I had to omit some string format columns because they are sensitive data

I am not sure with what was given. :man_shrugging:

The CSV has 13102 and it ingested them all with a correct date conversion.

Can you post the full date parse error in the log?

{
  "count" : 13102,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  }
}
Machine Time" : "2021-04-19T13:56:00.000Z",

What time zone are you in? Did 02:05 exist in that time zone, or did the time go from 01:59:59 to 03:00:00 and skip over the 02 hour?

How is it possible? The file I shared with you has 6551 raws, not 13102. How did you get that output you show me? I'm afraid the file was truncated or modified in some ways when I uploaded it to github gist. How many raws do you see?

I am in Italy. How do I answer your question? Where do I see it exactly?

In my advanced settings, i see that Timezone depends to Browser.

The file at failed parse error file · GitHub has 3513 lines. Not sure how I got that many results before since I downloaded the file and that's what was there. This time I just copy/pasted the rows.

But reran that one and got the same results.

{
  "count" : 3513,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  }
}

Then you are on CEDT, which starts on the last Sunday in March. 2 AM on March 28th did not happen, the time skipped to 3 AM, so a date filter cannot parse it. More commentary here.

2 Likes

You had no problems with parse error because the lines that fail to load were truncated when I uploaded the file to github.

I risked taking the computer and throwing it out the window ... :rofl: I've been trying to figure out the problem for three weeks