Trying to parse this datefield


(TestCandidate) #1

I have a CSV file with a date field as follows:

May 31, 2018 09:35:45.979371597 EEST

I am trying to parse it with logstash without any success (the field is still a string)

Here's my logstash filter:

filter {
  if [type] == "iclick-tcplog" {
   csv {
      columns => [
          "logdate",
          "ppoeip",
          "userip",
          "protocol",
          "logserverip",
          "destinationip",
          "remove1",
          "remove2",
          "sourceport",
          "destinationport",
          "description-url"
        ]
        separator => ";"
        remove_field => ["remove1", "remove2"]
   }
   date {
        match => [ "logdate", "MMMM dd, yyyy HH:mm:ss.SSS ZZZ" ]
        target => "logdate"
   }
  }
}

Any idea what could be wrong?
PS: I also tried
match => [ "logdate", "MMMM dd, yyyy HH:mm:ss.SSSSSSSSS ZZZ" ]


(Magnus Bäck) #2

If the date filter fails the Logstash log will contain details about the failure.


(Azhar) #3

Since your month is in abbreviated form, you will have to use MMM

Refer to this for more detailed information about the date plugin.

match => [ "logdate", "MMM dd, yyyy HH:mm:ss.SSSSSSSSS ZZZ" ]

Also, I suggest you start set the logging level to debug in the logstash.yml and restart the server and also have stdout {} in the output to debug your config further. The stdout messages would be visible in the journalctl output.


(TestCandidate) #4

I'll check and see if there's anything useful there. Thanks


(TestCandidate) #5

Thank you, I'll give that a try today.

But just out of curiosity, how did you know that the month is abbreviated? I only have data from May. And i'm not sure about its abbreviated form. wouldn't it still be the same?


(Azhar) #6

Yeah, for May, MMMM and MMM should not ideally make any difference. I guess you'll have to figure that out from the logstash logs about which one is correct.


(TestCandidate) #7

The logs Shows The Following:

[2018-06-04T19:17:02,664][WARN ][logstash.filters.csv     ] Error parsing csv {:field=>"message", :source=>"\"May 30, 2018 02:42:26.479833834 EEST\";\"172.25.35.1\";\"10.4.103.6\";\"IMAP\";\"172.25.35.24\";\"77.104.156.190\";\"51495\";\"143\";\"8897\";\"8897\";\"Request: 2 ID (\"name\" \"iPhone Mail\" \"version\" \"15E302\" \"os\" \"iOS\" \"os-version\" \"11.3.1 (15E302)\")\"", :exception=>#<CSV::MalformedCSVError: Missing or stray quote in line 1>}

Although the CSV is Valid. Weird.
I'm not sure if this issue is related though...


(Magnus Bäck) #8

The line with the column starting with "Request: 2 ID" is incorrectly quoted. Each column value is surrounded by a double qoute, but then any internal double quotes in the column values must be escaped, i.e.

...;"Request: 2 ID ("name" ...

must be

...;"Request: 2 ID (\"name\" ...

(TestCandidate) #9

Thank you. I will track that down to the tool generating the logs.

But I don't think that's the issue with the date. since only a small portion of the logs have this issue (at least that's what i can tell from the logstash logs) . However, I don't see an error related to date parsing :confused:


(TestCandidate) #10

On a second thought, Is it possible that it's failing to parse the date because the date has quotes in the original csv?


(TestCandidate) #11

Okay So I've made a simpler small CSV to debug with.

Here's the CSV now:

May 30, 2018 00:40:24.145155884 EEST;172.25.35.1;10.8.104.200;SSL;172.25.35.24;216.58.201.202;27786;443;8897;8897;Client Hello
May 30, 2018 00:40:24.149049258 EEST;172.25.35.1;10.6.103.93;SSL;172.25.35.24;138.201.234.175;1879;443;8897;8897;Client Hello

No quotes or weird stuff. Yet the date is still parsed as string from some reason.

Here's the current filter:

filter {
  if [type] == "iclick-tcplog" {
   csv {
      columns => [
          "logdate",
          "ppoeip",
          "userip",
          "protocol",
          "logserverip",
          "destinationip",
          "remove1",
          "remove2",
          "sourceport",
          "destinationport",
          "description-url"
        ]
        separator => ";"
        remove_field => ["remove1", "remove2"]
   }
   date {
        match => [ "logdate", "MMM dd, yyyy HH:mm:ss.SSSSSSSSS ZZZ" ]
        target => "logdate"
   }
  }
}

No Errors in the log File :frowning:


(Magnus Bäck) #12

Yet the date is still parsed as string from some reason.

Are you talking about the data type in the Elasticsearch index? Because that's a completely different thing. ES will recognize field values produced by the date filter as a date and map the field as that data type, but that only happens the first time a document with a field with that name is seen by the index. If you're still just playing around you can just delete the index and make Logstash process the data again.


(TestCandidate) #13

Yes, I am talking about ElasticSearch - I am checking the field in Kibana.

I am deleting the index & filebeat's registry every time I'm testing.

I'm still failing to save the logdate as a date.


(TestCandidate) #14

Sorry to spam this, but as an update, I manually edited the log to this:

Apr 17 09:32:01;172.25.35.1;10.8.104.200;SSL;172.25.35.24;216.58.201.202;27786;443;8897;8897;Client Hello
Apr 17 09:32:12;172.25.35.1;10.6.103.93;SSL;172.25.35.24;138.201.234.175;1879;443;8897;8897;Client Hello

And edited the filter to this:

filter {
  if [type] == "iclick-tcplog" {
   csv {
      columns => [
          "logdate",
          "ppoeip",
          "userip",
          "protocol",
          "logserverip",
          "destinationip",
          "remove1",
          "remove2",
          "sourceport",
          "destinationport",
          "description-url"
        ]
        separator => ";"
        remove_field => ["remove1", "remove2"]
   }
   date {
        match => [ "logdate", "MMM dd yyyy HH:mm:ss", "MMM  d yyyy HH:mm:ss", "ISO8601"]
        target => "logdate"
   }
  }
}

Yet Still appearing as string... it's getting frustrating


(Magnus Bäck) #15

What does an example document look like? Copy/paste the raw JSON from Kibana's JSON tab.


(TestCandidate) #16

Thank you for bearing with me
Here's a raw json:

{
  "_index": "filebeat-2018.06.04",
  "_type": "iclick-tcplog",
  "_id": "AWPMZ5u248sBzQ3deLEk",
  "_version": 1,
  "_score": null,
  "_source": {
    "ppoeip": "172.25.35.1",
    "logserverip": "172.25.35.24",
    "offset": 106,
    "input_type": "log",
    "source": "/var/capture/faridtest/farid.csv",
    "message": "Apr 17 09:32:01;172.25.35.1;10.8.104.200;SSL;172.25.35.24;216.58.201.202;27786;443;8897;8897;Client Hello",
    "type": "iclick-tcplog",
    "tags": [
      "beats_input_codec_plain_applied",
      "_dateparsefailure"
    ],
    "description-url": "Client Hello",
    "destinationip": "216.58.201.202",
    "sourceport": "8897",
    "destinationport": "8897",
    "protocol": "SSL",
    "@timestamp": "2018-06-04T20:05:26.847Z",
    "logdate": "Apr 17 09:32:01",
    "@version": "1",
    "beat": {
      "name": "Iclik-LOG",
      "hostname": "Iclik-LOG",
      "version": "5.6.9"
    },
    "host": "Iclik-LOG",
    "userip": "10.8.104.200"
  },
  "fields": {
    "@timestamp": [
      1528142726847
    ]
  },
  "sort": [
    1528142726847
  ]
}

Weirdly enough i just noticed that there's only 1 record while the CSV had 2 lines. but I assume that's a different problem for now.


(Magnus Bäck) #17

The date filter's parsing of the logdate field failed, hence the _dateparsefailure tag and the wrong data type of the field in ES. Your timestamp doesn't include the year so remove "yyyy" from your date patterns.


(TestCandidate) #18

I missed that. Okay, not I can see that it managed to understand / change the date as the JSON became:

{
  "_index": "filebeat-2018.06.04",
  "_type": "iclick-tcplog",
  "_id": "AWPMcnRv48sBzQ3deLEo",
  "_version": 1,
  "_score": null,
  "_source": {
    "ppoeip": "172.25.35.1",
    "logserverip": "172.25.35.24",
    "offset": 106,
    "input_type": "log",
    "source": "/var/capture/faridtest/farid.csv",
    "message": "Apr 17 09:32:01;172.25.35.1;10.8.104.200;SSL;172.25.35.24;216.58.201.202;27786;443;8897;8897;Client Hello",
    "type": "iclick-tcplog",
    "tags": [
      "beats_input_codec_plain_applied"
    ],
    "description-url": "Client Hello",
    "destinationip": "216.58.201.202",
    "sourceport": "8897",
    "destinationport": "8897",
    "protocol": "SSL",
    "@timestamp": "2018-06-04T20:17:14.596Z",
    "logdate": "2018-04-17T06:32:01.000Z",
    "@version": "1",
    "beat": {
      "name": "Iclik-LOG",
      "hostname": "Iclik-LOG",
      "version": "5.6.9"
    },
    "host": "Iclik-LOG",
    "userip": "10.8.104.200"
  },
  "fields": {
    "@timestamp": [
      1528143434596
    ]
  },
  "sort": [
    1528143434596
  ]
}

But two questions remains,
1- Why the field type is still a string in the db? :confused:
2- Why the original pattern is not parsing the original timestamp (since i have to revert to the original - this format was just for debugging)

Thanks again,


(Magnus Bäck) #19

Why the field type is still a string in the db?

Did you drop the ES index? If you're looking in Kibana, did you refresh the field list?

Why the original pattern is not parsing the original timestamp (since i have to revert to the original - this format was just for debugging)

I'm not sure SSSSSSSSS is able to parse microseconds. I also don't think ZZZ can parse timezone names like EEST. But again: If the date filter fails the Logstash log will contain details about the failure.


(Azhar) #20

This will not help solve the issue, but just a useful tip to make debugging easier...

Have stdin {} as your input plugin and stdout {} in your output plugin.
Stop the logstash service.
Try starting the server manually like this to get the parsed output on the console:
echo "<a single line from your .csv>" | <logstash executable path> --path.settings <logstash config directory> -f <pipeline config file path> --debug

For instance, "May 31, 2018 09:35:45.979371597 EEST" | /usr/share/logstash/bin/logstash --path.settings /etc/logstash -f /etc/logstash/conf.d/logstash.conf --debug

Debugging like this makes life much easier...