I'm using filebeat and logstash to push Yarn Container Logs (specifically Spark) to Kinesis then ES and Kibana.
I'm running these as SysV services on AWS Linux:
Filebeat - 5.2.1
Logstash - 2.4.1 (Need the Kinesis Output, which is why I'm still on this version)
Updated logstash-input-beats 3.1.8 to 3.1.12
Two Issues:
1. I'm getting some data loss when I write to Kinesis. Rarely is there anything in the logs, and when I put it in debug mode I can't make sense of it. This I can probably figure out with the help of AWS. However, if anybody has any ideas on settings that would be great.
2. Precision. The @timestamp is in milli-seconds. It's not enough precision and causing my table data dumps to the Application Master to be totally out of order.
Kibana Snippet
I attempted to implement a sort key, but that clearly doesn't work if there is more than 1 worker, and 1 worker is too slow and filebeat then has connection issues with Logstash.
Here's my Ruby Code for log_sort using a hash map:
btw, I don't know ruby...google coding
# Add log_sort_key for containers ruby { init => '@@global_container_id = {}' code => ' key = event["container_id"] + "-" + event["container_file_name"] if @@global_container_id.has_key?(key) @@global_container_id[key] = @@global_container_id[key] + 1 else @@global_container_id[key] = 1 end event["log_sort_key"] = @@global_container_id[key] ' }
I don't know what to do...we're trying to replace Splunk and this isn't going to cut it. Any help would be appreciated.
rgeno