Trying to parse this datefield

TestCandidate · June 3, 2018, 6:25pm

I have a CSV file with a date field as follows:

May 31, 2018 09:35:45.979371597 EEST

I am trying to parse it with logstash without any success (the field is still a string)

Here's my logstash filter:

filter {
  if [type] == "iclick-tcplog" {
   csv {
      columns => [
          "logdate",
          "ppoeip",
          "userip",
          "protocol",
          "logserverip",
          "destinationip",
          "remove1",
          "remove2",
          "sourceport",
          "destinationport",
          "description-url"
        ]
        separator => ";"
        remove_field => ["remove1", "remove2"]
   }
   date {
        match => [ "logdate", "MMMM dd, yyyy HH:mm:ss.SSS ZZZ" ]
        target => "logdate"
   }
  }
}

Any idea what could be wrong?
PS: I also tried
match => [ "logdate", "MMMM dd, yyyy HH:mm:ss.SSSSSSSSS ZZZ" ]

magnusbaeck · June 3, 2018, 7:19pm

If the date filter fails the Logstash log will contain details about the failure.

azhar · June 3, 2018, 10:39pm

Since your month is in abbreviated form, you will have to use MMM

Refer to this for more detailed information about the date plugin.

match => [ "logdate", "MMM dd, yyyy HH:mm:ss.SSSSSSSSS ZZZ" ]

Also, I suggest you start set the logging level to debug in the logstash.yml and restart the server and also have stdout {} in the output to debug your config further. The stdout messages would be visible in the journalctl output.

TestCandidate · June 4, 2018, 7:40am

I'll check and see if there's anything useful there. Thanks

TestCandidate · June 4, 2018, 7:42am

Thank you, I'll give that a try today.

But just out of curiosity, how did you know that the month is abbreviated? I only have data from May. And i'm not sure about its abbreviated form. wouldn't it still be the same?

azhar · June 4, 2018, 8:36am

Yeah, for May, MMMM and MMM should not ideally make any difference. I guess you'll have to figure that out from the logstash logs about which one is correct.

TestCandidate · June 4, 2018, 4:25pm

The logs Shows The Following:

[2018-06-04T19:17:02,664][WARN ][logstash.filters.csv     ] Error parsing csv {:field=>"message", :source=>"\"May 30, 2018 02:42:26.479833834 EEST\";\"172.25.35.1\";\"10.4.103.6\";\"IMAP\";\"172.25.35.24\";\"77.104.156.190\";\"51495\";\"143\";\"8897\";\"8897\";\"Request: 2 ID (\"name\" \"iPhone Mail\" \"version\" \"15E302\" \"os\" \"iOS\" \"os-version\" \"11.3.1 (15E302)\")\"", :exception=>#<CSV::MalformedCSVError: Missing or stray quote in line 1>}

Although the CSV is Valid. Weird.
I'm not sure if this issue is related though...

magnusbaeck · June 4, 2018, 5:07pm

The line with the column starting with "Request: 2 ID" is incorrectly quoted. Each column value is surrounded by a double qoute, but then any internal double quotes in the column values must be escaped, i.e.

...;"Request: 2 ID ("name" ...

must be

...;"Request: 2 ID (\"name\" ...

TestCandidate · June 4, 2018, 5:29pm

Thank you. I will track that down to the tool generating the logs.

But I don't think that's the issue with the date. since only a small portion of the logs have this issue (at least that's what i can tell from the logstash logs) . However, I don't see an error related to date parsing

TestCandidate · June 4, 2018, 5:31pm

On a second thought, Is it possible that it's failing to parse the date because the date has quotes in the original csv?

TestCandidate · June 4, 2018, 6:07pm

Okay So I've made a simpler small CSV to debug with.

Here's the CSV now:

May 30, 2018 00:40:24.145155884 EEST;172.25.35.1;10.8.104.200;SSL;172.25.35.24;216.58.201.202;27786;443;8897;8897;Client Hello
May 30, 2018 00:40:24.149049258 EEST;172.25.35.1;10.6.103.93;SSL;172.25.35.24;138.201.234.175;1879;443;8897;8897;Client Hello

No quotes or weird stuff. Yet the date is still parsed as string from some reason.

Here's the current filter:

filter {
  if [type] == "iclick-tcplog" {
   csv {
      columns => [
          "logdate",
          "ppoeip",
          "userip",
          "protocol",
          "logserverip",
          "destinationip",
          "remove1",
          "remove2",
          "sourceport",
          "destinationport",
          "description-url"
        ]
        separator => ";"
        remove_field => ["remove1", "remove2"]
   }
   date {
        match => [ "logdate", "MMM dd, yyyy HH:mm:ss.SSSSSSSSS ZZZ" ]
        target => "logdate"
   }
  }
}

No Errors in the log File

magnusbaeck · June 4, 2018, 6:14pm

Yet the date is still parsed as string from some reason.

Are you talking about the data type in the Elasticsearch index? Because that's a completely different thing. ES will recognize field values produced by the date filter as a date and map the field as that data type, but that only happens the first time a document with a field with that name is seen by the index. If you're still just playing around you can just delete the index and make Logstash process the data again.

TestCandidate · June 4, 2018, 6:43pm

Yes, I am talking about Elasticsearch - I am checking the field in Kibana.

I am deleting the index & filebeat's registry every time I'm testing.

I'm still failing to save the logdate as a date.

TestCandidate · June 4, 2018, 7:06pm

Sorry to spam this, but as an update, I manually edited the log to this:

Apr 17 09:32:01;172.25.35.1;10.8.104.200;SSL;172.25.35.24;216.58.201.202;27786;443;8897;8897;Client Hello
Apr 17 09:32:12;172.25.35.1;10.6.103.93;SSL;172.25.35.24;138.201.234.175;1879;443;8897;8897;Client Hello

And edited the filter to this:

filter {
  if [type] == "iclick-tcplog" {
   csv {
      columns => [
          "logdate",
          "ppoeip",
          "userip",
          "protocol",
          "logserverip",
          "destinationip",
          "remove1",
          "remove2",
          "sourceport",
          "destinationport",
          "description-url"
        ]
        separator => ";"
        remove_field => ["remove1", "remove2"]
   }
   date {
        match => [ "logdate", "MMM dd yyyy HH:mm:ss", "MMM  d yyyy HH:mm:ss", "ISO8601"]
        target => "logdate"
   }
  }
}

Yet Still appearing as string... it's getting frustrating

magnusbaeck · June 4, 2018, 8:01pm

What does an example document look like? Copy/paste the raw JSON from Kibana's JSON tab.

TestCandidate · June 4, 2018, 8:07pm

Thank you for bearing with me
Here's a raw json:

{
  "_index": "filebeat-2018.06.04",
  "_type": "iclick-tcplog",
  "_id": "AWPMZ5u248sBzQ3deLEk",
  "_version": 1,
  "_score": null,
  "_source": {
    "ppoeip": "172.25.35.1",
    "logserverip": "172.25.35.24",
    "offset": 106,
    "input_type": "log",
    "source": "/var/capture/faridtest/farid.csv",
    "message": "Apr 17 09:32:01;172.25.35.1;10.8.104.200;SSL;172.25.35.24;216.58.201.202;27786;443;8897;8897;Client Hello",
    "type": "iclick-tcplog",
    "tags": [
      "beats_input_codec_plain_applied",
      "_dateparsefailure"
    ],
    "description-url": "Client Hello",
    "destinationip": "216.58.201.202",
    "sourceport": "8897",
    "destinationport": "8897",
    "protocol": "SSL",
    "@timestamp": "2018-06-04T20:05:26.847Z",
    "logdate": "Apr 17 09:32:01",
    "@version": "1",
    "beat": {
      "name": "Iclik-LOG",
      "hostname": "Iclik-LOG",
      "version": "5.6.9"
    },
    "host": "Iclik-LOG",
    "userip": "10.8.104.200"
  },
  "fields": {
    "@timestamp": [
      1528142726847
    ]
  },
  "sort": [
    1528142726847
  ]
}

Weirdly enough i just noticed that there's only 1 record while the CSV had 2 lines. but I assume that's a different problem for now.

magnusbaeck · June 4, 2018, 8:14pm

The date filter's parsing of the logdate field failed, hence the _dateparsefailure tag and the wrong data type of the field in ES. Your timestamp doesn't include the year so remove "yyyy" from your date patterns.

TestCandidate · June 4, 2018, 8:21pm

I missed that. Okay, not I can see that it managed to understand / change the date as the JSON became:

{
  "_index": "filebeat-2018.06.04",
  "_type": "iclick-tcplog",
  "_id": "AWPMcnRv48sBzQ3deLEo",
  "_version": 1,
  "_score": null,
  "_source": {
    "ppoeip": "172.25.35.1",
    "logserverip": "172.25.35.24",
    "offset": 106,
    "input_type": "log",
    "source": "/var/capture/faridtest/farid.csv",
    "message": "Apr 17 09:32:01;172.25.35.1;10.8.104.200;SSL;172.25.35.24;216.58.201.202;27786;443;8897;8897;Client Hello",
    "type": "iclick-tcplog",
    "tags": [
      "beats_input_codec_plain_applied"
    ],
    "description-url": "Client Hello",
    "destinationip": "216.58.201.202",
    "sourceport": "8897",
    "destinationport": "8897",
    "protocol": "SSL",
    "@timestamp": "2018-06-04T20:17:14.596Z",
    "logdate": "2018-04-17T06:32:01.000Z",
    "@version": "1",
    "beat": {
      "name": "Iclik-LOG",
      "hostname": "Iclik-LOG",
      "version": "5.6.9"
    },
    "host": "Iclik-LOG",
    "userip": "10.8.104.200"
  },
  "fields": {
    "@timestamp": [
      1528143434596
    ]
  },
  "sort": [
    1528143434596
  ]
}

But two questions remains,
1- Why the field type is still a string in the db?
2- Why the original pattern is not parsing the original timestamp (since i have to revert to the original - this format was just for debugging)

Thanks again,

magnusbaeck · June 4, 2018, 8:33pm

Why the field type is still a string in the db?

Did you drop the ES index? If you're looking in Kibana, did you refresh the field list?

Why the original pattern is not parsing the original timestamp (since i have to revert to the original - this format was just for debugging)

I'm not sure SSSSSSSSS is able to parse microseconds. I also don't think ZZZ can parse timezone names like EEST. But again: If the date filter fails the Logstash log will contain details about the failure.

azhar · June 4, 2018, 8:49pm

This will not help solve the issue, but just a useful tip to make debugging easier...

Have stdin {} as your input plugin and stdout {} in your output plugin.
Stop the logstash service.
Try starting the server manually like this to get the parsed output on the console:
echo "<a single line from your .csv>" | <logstash executable path> --path.settings <logstash config directory> -f <pipeline config file path> --debug

For instance, "May 31, 2018 09:35:45.979371597 EEST" | /usr/share/logstash/bin/logstash --path.settings /etc/logstash -f /etc/logstash/conf.d/logstash.conf --debug

Debugging like this makes life much easier...

Topic		Replies	Views
Csv filter match date field Logstash	5	288	March 20, 2020
Logstash Date Filter Plugin: Field remains a string datatype Logstash	3	394	April 4, 2018
Logstash : Time Field Logstash	10	2477	February 5, 2018
Parsing date format error Logstash	3	761	October 24, 2018
Logstash: Not able to parse date field from csv Logstash	3	789	March 27, 2019

Trying to parse this datefield

Related topics