Date Parsing Error After Index Rotation - Any Ideas?

Hello Everyone,

I have been parsing TACACS logs for a few months now without error and all of a sudden, on 2/1/2020, the logs stopped appearing in Kibana. I mention this because my current index name is logstash-2020.02.01-000004 and the last TACACS log in Kibana is Feb 1st. I have two pipelines writing to the same index, but they are both using the same date date parsers and the other pipeline is still working perfectly (other pipeline is parsing syslog messages).

This led me to look into my Logstash logs and I found the following:

[WARN ][logstash.outputs.elasticsearch][main] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"logstash", :_type=>"_doc", :routing=>nil}, #<LogStash::Event:0x5ebdddeb>], :response=>{"index"=>{"_index"=>"logstash-2020.02.01-000004", "_type"=>"_doc", "_id"=>"_xZiZHABI1TdtHTEPPdJ", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [timestamp] of type [date] in document with id '_xZiZHABI1TdtHTEPPdJ'. Preview of field's value: '2020-02-20 14:57:20 -0600'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"failed to parse date field [2020-02-20 14:57:20 -0600] with format [strict_date_optional_time||epoch_millis]", "caused_by"=>{"type"=>"date_time_parse_exception", "reason"=>"Failed to parse with all enclosed parsers"}}}}}}

This log seemed to be counter-intuitive to me as the date format for my logs has not changed with my TACACS logs, literally ever. So my next thought was to review my pipeline configuration which, again, has not been changed in a while. Here is my pipeline configuration with the IP changed for privacy.

    input {
      beats {
        # Listen on port 5040 for Accouting Logs from TACACS Servers
        port => "5040"
      }
    }

    filter {
      grok {
    	# TACACS Command Logs - Cisco and Ruckus/Brocade Switches
        match => {"message" => "%{TIMESTAMP_ISO8601:timestamp}%{SPACE}%{IP:remote_device}%{SPACE}%{USERNAME:user}%{SPACE}%{WORD:conn_type}%{SPACE}%{IP:connected_via}%{SPACE}.*cmd=%{DATA:command}\<cr\>" }
    	# TACACS Command Logs - HP/Aruba Switches
    	match => {"message" => "%{TIMESTAMP_ISO8601:timestamp}%{SPACE}%{IP:remote_device}%{SPACE}%{USERNAME:user}%{SPACE}%{IP:connected_via}%{SPACE}.*cmd=%{DATA:command}\Z" }
        # TACACS Authentication - Success and Fails
        match => {"message" => "%{TIMESTAMP_ISO8601:timestamp}%{SPACE}%{IP:remote_device}%{SPACE}%{USERNAME:user}%{SPACE}%{WORD:conn_type}%{SPACE}%{IP:connected_via}%{SPACE}.*login %{DATA:response}\Z" }
      }
      date {
        match => ["timestamp", "YYYY-MM-dd HH:mm:ss Z", "ISO8601"]
      }
      if "_grokparsefailure" in [tags] {
        drop { }
      }
    } 

    output {
            elasticsearch {
                    hosts => [ "1.1.1.1:9200" ]
            }
    }

Here are also some example logs being parsed, which seem to be clearly following the defined format.

2020-02-20 15:03:25 -0600       1.1.1.1        username tty16   2.2.2.2    stop    task_id=1       timezone=GMT+00 service=shell   priv-lvl=0      cmd=exit <cr>
2020-02-20 15:03:30 -0600       1.1.1.1        username tty16   2.2.2.2    stop    task_id=1       timezone=GMT+00 service=shell   priv-lvl=0      cmd=vlan 20 <cr>
2020-02-20 15:03:45 -0600       1.1.1.1        username tty16   2.2.2.2    stop    task_id=1       timezone=GMT+00 service=shell   priv-lvl=0      cmd=tagged ethernet 1/1/2 to 1/1/48 ethernet 2/1/2 to 2/1/48 <cr>
2020-02-20 15:03:53 -0600       1.1.1.1        username tty16   2.2.2.2    stop    task_id=1       timezone=GMT+00 service=shell   priv-lvl=0      cmd=exit <cr>
2020-02-20 15:03:55 -0600       1.1.1.1        username tty16   2.2.2.2    stop    task_id=1       timezone=GMT+00 service=shell   priv-lvl=0      cmd=exit <cr>
2020-02-20 15:03:58 -0600       1.1.1.1        username tty16   2.2.2.2    stop    task_id=1       timezone=GMT+00 service=shell   priv-lvl=0      cmd=show vlan <cr>

Does anyone have any ideas as to what would cause this and how I can correct it?

I mentioned another pipeline in my previous post, and I wanted to show the working timeline configuration as well to give everyone a frame of reference and hope that I can get this solved, as I have been unable to figure this out on my own.

The working pipeline configuration (writing to same logstash-* index):

    input {
           beats {
                    # Listen on port 5514 for Syslog traffic via Filebeat Shippers
                    port => "5514"
            }
    }

    filter {
      
      # Tag syslog logs based on directory containing  by-ip, then extract fields
      if [log][file][path] =~ "by-ip" {
        grok {
          match => { "message" => "\A%{TIMESTAMP_ISO8601:timestamp}%{SPACE}%{IP:remote_ip}%{SPACE}%{NOTSPACE:remote_hostname}%{GREEDYDATA:log_message}\Z" }
          add_tag => ["syslog"]
        }
      }

      # Mark old syslog logs as based on them not being in a 'by-ip' directory 
      if [log][file][path] !~ "by-ip" {
        grok {
          match => { "message" => "\A%{TIMESTAMP_ISO8601:timestamp}%{SPACE}%{NOTSPACE:remote_hostname}%{SPACE}%{GREEDYDATA:log_message}\Z" }
          add_tag => ["old_syslog"]
        }
      }
      
       date {
         match => ["timestamp","YYYY-MM-dd HH:mm:ss Z" ,"ISO8601"]
       }
    }
      
    output {
            elasticsearch {
                    hosts => [ "1.1.1.1:9200" ]
    	}
    }

Some example log entries that are being successfully processed by this pipeline:

2020-02-21T17:02:18-06:00 10.20.1.300 10.20.1.300 syslog: lwapp_proc_add_mobile_apend_dvc_type: send device type to zd, rc 26, station 11:22:33:44:55:66
2020-02-21T17:07:22-06:00 10.20.1.300 10.20.1.300 syslog: lwapp_proc_add_mobile_apend_dvc_type: send device type to zd, rc 26, station 11:22:33:44:55:66
2020-02-21T17:08:14-06:00 10.20.1.300 10.20.1.300 avpd[448]: avpd_process_roam_from: 740: the rflow count need to be synchronized is 0
2020-02-21T17:10:35-06:00 10.20.1.300 10.20.1.300 avpd[448]: avpd_process_roam_from: 740: the rflow count need to be synchronized is 0
2020-02-21T17:10:49-06:00 10.20.1.300 10.20.1.300 avpd[448]: avpd_process_roam_from: 740: the rflow count need to be synchronized is 0

I realize now that the timestamps are different, but that still doesn't explain why;

  1. This was working for months with no issues
  2. The parser is failing, even though the log entry timestamps seem to clearly match the defined pattern.

For anyone wondering; I was able to fix (workaround, really...) this by changing the Logstash output to Elasticsearch Index. I cannot explain this since, regardless of Index, the date parsing should have worked since nothing else in my configuration changes. This suggests to me that there's some form of transposition issue when Logstash is converting the timestamp field to @timestamp (which is used by Elasticsearch.)

If anyone reads this and sees anything that I'm mistaken about, please let me know. I am not an expert by any means, but I would like to fully understand why this corrected my issue. Seeing as no one has responded thus far, I had to take it upon myself to get this working.

elasticsearch expects the [timestamp] field to be a date. It is arriving as a string, and the default parser only understands strict_date_optional_time and epoch_millis.

For your syslog logs the timestamp is "2020-02-21T17:02:18-06:00" which is a strict_date_optional_time, so those logs are processed OK.

For your TACACS logs the timestamp is "2020-02-20 14:57:20 -0600", which is neither strict_date_optional_time nor epoch_millis so elasticsearch gets a mapping exception.

I suspect what changed is that on Feb 1st either someone added a mapping to the index (check using this) or else someone started writing events to elasticsearch in which [timestamp] is a date, not a string, and the first event written determines the type expected.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.