Wrong events order if time difference is less than 1ms

Hi.

I use version 7.13.4 of all stack (ELK + filebeats).
Filebeat grabs events (auth.log entries in this case) on remote host, then passes them to logstash to filter/modify data, then events go to Elasticsearch. Talking about Logstash, I use aggregate plugin pretty extensively to populate events with required data, so I heavily rely on correct events order in same sequence.

Problem:
If two events come during 1 millisecond period, there is a chance they will end up in wrong order, which prevents aggregate plugin to populate data correctly.

Input data example:

2021-09-23T10:26:19.980107+03:00 my-test-server-hostname pluto[1284082]: "xauth-psk"[18373] 184.105.176.47 #23495: responding to Main Mode from unknown peer 184.105.176.47 on port 54209
2021-09-23T10:26:33.856101+03:00 my-test-server-hostname pluto[1284082]: "xauth-psk"[18374] 184.105.176.47 #23495: PAM: #23495: completed for user 'testusername' with status SUCCESSS
2021-09-23T10:26:35.022162+03:00 my-test-server-hostname pluto[1284082]: "xauth-psk"[18374] 184.105.176.47 #23495: the peer proposed: 0.0.0.0/0:0/0 -> 192.168.43.123/32:0/0
2021-09-23T10:26:35.022643+03:00 my-test-server-hostname pluto[1284082]: "xauth-psk"[18374] 184.105.176.47 #23498: responding to Quick Mode proposal {msgid:bb095b40}
2021-09-23T10:26:35.023138+03:00 my-test-server-hostname pluto[1284082]: "xauth-psk"[18374] 184.105.176.47 #23498:   them: 184.105.176.47[192.168.1.101,+MC+XC+S=C]===192.168.43.123/32
2021-09-23T10:27:05.045830+03:00 my-test-server-hostname pluto[1284082]: "xauth-psk"[18374] 184.105.176.47 #23495: deleting state (STATE_MODE_CFG_R1) aged 45.066s and sending notification
2021-09-23T10:27:39.036857+03:00 my-test-server-hostname pluto[1284082]: "xauth-psk"[18374] 184.105.176.47 #23498: deleting state (STATE_QUICK_R1) aged 64.013s and sending notification
2021-09-23T10:27:39.036955+03:00 my-test-server-hostname pluto[1284082]: "xauth-psk"[18374] 184.105.176.47 #23498: ESP traffic information: in=0B out=0B XAUTHuser=testusername

But if I check same events in kibana:

I have tried to do my research. I have

pipeline.workers: 1
pipeline.ordered: true

in my logstash pipeline configuration from the very beginning of using aggregate. Also, I know that logstash does not support time precision more than 1ms, so, at least for kibana and Elasticsearch, I'm using some kind of workaround:

grok {
    match => { "message" => '^%{SYSLOGTIMESTAMP:timestamp_string} %{DATA:hostname} ... ' }
#   I get "raw timestamp text" as `timestamp_string` field ...
}

date {
    timezone => "UTC"
    match => ["timestamp_string", "ISO8601"]
#   ... then I generate logstash own `@timestamp` field based on `timestamp_string` ...
}

mutate {
    copy => { "timestamp_string" => "timestamp" }
#   ... and then I copy same string to `timestamp` field
}

(and then, in Elasticsearch index mapping, I set "timestamp" to "date_nanos" format, so, when saving to elastic index, it will be parsed/stored correctly + in kibana index pattern I use "timestamp" as a time field)

As far as I understand, this problem exists because I use logstash date filter plugin - I guess, without date plugin (i.e. without rewriting events' "@timestamp" field) logstash could possibly save "natural events order". But, without date plugin, I risk to get events heavily out-of-order in case of some logstash/filebeat/network temporary problems.

Right now, at this moment, I'm experimenting and have disabled date plugin at all - I want to test, will it work correctly indeed.
But, even if it'll end up working correctly, I'm still curious is it possible to make it work with date plugin (so logstash will populate it's "@timestamp" with a value from event, but will still respect the order of events). In other words, is there any workaround for logstash to support time precision higher than 1ms by itself.

And... Looks like it works :smiley:
What's interesting, is that order of events in Kibana is partially wrong (no matter if I try sorting them by my "timestamp" field, or not), but Logstash, on the other hand, is working correctly so far (I don't see any cases of my problem with aggregate plugin since the beginning of the experiment).

Anyway, I'm still interested in this:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.