Logstash @timestamp precision causing misordering of data

I'm using filebeat and logstash to push Yarn Container Logs (specifically Spark) to Kinesis then ES and Kibana.

I'm running these as SysV services on AWS Linux:
Filebeat - 5.2.1
Logstash - 2.4.1 (Need the Kinesis Output, which is why I'm still on this version)

Updated logstash-input-beats 3.1.8 to 3.1.12

Two Issues:

1. I'm getting some data loss when I write to Kinesis. Rarely is there anything in the logs, and when I put it in debug mode I can't make sense of it. This I can probably figure out with the help of AWS. However, if anybody has any ideas on settings that would be great.

2. Precision. The @timestamp is in milli-seconds. It's not enough precision and causing my table data dumps to the Application Master to be totally out of order.

Kibana Snippet

I attempted to implement a sort key, but that clearly doesn't work if there is more than 1 worker, and 1 worker is too slow and filebeat then has connection issues with Logstash.

Here's my Ruby Code for log_sort using a hash map:

btw, I don't know ruby...google coding

    # Add log_sort_key for containers
    ruby {
        init => '@@global_container_id = {}'
        code => '
            key = event["container_id"] + "-" + event["container_file_name"]
            if @@global_container_id.has_key?(key)
                @@global_container_id[key] = @@global_container_id[key] + 1
            else
                @@global_container_id[key] = 1
            end
            event["log_sort_key"] = @@global_container_id[key]
        '
    }

I don't know what to do...we're trying to replace Splunk and this isn't going to cut it. Any help would be appreciated.

rgeno

The issue is that both Logstash and Elasticsearch have depended on JODA for time calculation, and JODA only supports milliseconds. Solutions to this are being actively worked on, but they are not ready quite yet.

Any type of workaround that anybody has seen. I was aware of the issue with joda time.

One that was suggested was to split the microseconds (or nanoseconds) off into their own separate field and then try to filter/sort on two separate fields.

In truth, a more complete solution is coming in Elasticsearch later this year (from what I recall). Any other workaround will be difficult until then.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.