I am looking at setting up some log aggregation using Filebeats and Elastisearch. We currently have around 200 servers that logs will be aggregated from, with aspirations for growth - the attraction to Elastisearch is that it will be able to scale.
Some of the logs to be aggregated are from Rails applications. Ideally all log lines for a particular HTTP request should be logged into one record in Elastisearch.
On a busy server log lines from different HTTP requests will be interleaved, so Filebeats multiline merging will not work. Fortunately Rails can prefix log lines with a request identifier like this:
[a00c54e0-874f-4358-a7b9-accef0c407c6] Started GET "/dashboard" for 127.0.0.1 at 2014-11-24 15:48:30 +0000
So, it theory it should be pretty straightforward to aggregate the associated lines. But how exactly with Filebeats and Elastisearch?
One thought is to send logs through Logstash and use the aggregate filter (https://www.elastic.co/guide/en/logstash/current/plugins-filters-aggregate.html) to merge the related lines. However if/when we need to scale to multiple machines running logstash we will have problems as all the related log lines will not go through the same aggregation point. I.e. some log lines for a particular request will go through one instance of Logstash, others will go through another instance ..., so lines for the request will be aggregated into two or more separate records in Elastisearch.
I could aggregate the logs on Elastisearche using the update API, however this would not be performant.
Ideally aggregation would be done on the origin server. I was looking to see if Filebeats could pipe log lines to an external process to perform this specialised processing, but there does not appear to be such an option.
Does anyone hear have any ideas on how to solve this problem?