Logstash @timestamp precision causing misordering of data

rbgeno7 · February 17, 2017, 6:07pm

I'm using filebeat and logstash to push Yarn Container Logs (specifically Spark) to Kinesis then ES and Kibana.

I'm running these as SysV services on AWS Linux:
Filebeat - 5.2.1
Logstash - 2.4.1 (Need the Kinesis Output, which is why I'm still on this version)

Updated logstash-input-beats 3.1.8 to 3.1.12

Two Issues:

1. I'm getting some data loss when I write to Kinesis. Rarely is there anything in the logs, and when I put it in debug mode I can't make sense of it. This I can probably figure out with the help of AWS. However, if anybody has any ideas on settings that would be great.

2. Precision. The @timestamp is in milli-seconds. It's not enough precision and causing my table data dumps to the Application Master to be totally out of order.

Kibana Snippet

I attempted to implement a sort key, but that clearly doesn't work if there is more than 1 worker, and 1 worker is too slow and filebeat then has connection issues with Logstash.

Here's my Ruby Code for log_sort using a hash map:

btw, I don't know ruby...google coding

    # Add log_sort_key for containers
    ruby {
        init => '@@global_container_id = {}'
        code => '
            key = event["container_id"] + "-" + event["container_file_name"]
            if @@global_container_id.has_key?(key)
                @@global_container_id[key] = @@global_container_id[key] + 1
            else
                @@global_container_id[key] = 1
            end
            event["log_sort_key"] = @@global_container_id[key]
        '
    }

I don't know what to do...we're trying to replace Splunk and this isn't going to cut it. Any help would be appreciated.

rgeno

theuntergeek · February 17, 2017, 6:20pm

The issue is that both Logstash and Elasticsearch have depended on JODA for time calculation, and JODA only supports milliseconds. Solutions to this are being actively worked on, but they are not ready quite yet.

rbgeno7 · February 17, 2017, 6:32pm

Any type of workaround that anybody has seen. I was aware of the issue with joda time.

theuntergeek · February 17, 2017, 6:34pm

One that was suggested was to split the microseconds (or nanoseconds) off into their own separate field and then try to filter/sort on two separate fields.

theuntergeek · February 17, 2017, 6:43pm

In truth, a more complete solution is coming in Elasticsearch later this year (from what I recall). Any other workaround will be difficult until then.

system · March 17, 2017, 6:43pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Debugging Elasticsearch output in Logstash Logstash	9	1701	July 24, 2020
Logstash sends logs with wrong @timestamp Logstash	11	2424	November 7, 2019
@timestamp in log file getting replaced by processing time Beats filebeat	7	6529	July 19, 2016
Low @timestamp precision cause wrong order of events with docker input Beats filebeat	7	2912	August 7, 2018
How to set @timestamp precision to microsecond Beats filebeat	0	41	September 6, 2024

Logstash @timestamp precision causing misordering of data

Related topics