Sorry about the title, my case really couldn't be explained with a single sentence.
Here is my situation:
- I have a large set of log files (around 4GB) that I wish to parse with Logstash to use with Elastic stack (Logstash, Elasticsearch, Kibana).
- In the logs, there is a serial number that I have successfully parsed with Logstash. This number corresponds to an index of a MongoDB collection. As each log is being parsed, I want to be able to query the collection with the parsed number and retrieve data which I want to include in the final output that is passed to Elasticsearch.
To make things clearer, here's a rough example. Suppose I have the raw log:
2017-11-20 14:24:14.011 123 log_number_one
Before the parsed log gets sent to Elasticsearch, I want to query my MongoDB collection with 123, and get data data1 and data2 to append to the document to be sent to Elasticsearch, so my end result will have fields be similar to something like:
{
timestamp: 2017-11-20 14:24:14.011,
serial: 123,
data1: "foo",
data2: "bar",
log: log_number_one
}
An easier way to accomplish this, I assume, would be to simply preprocess the logs and run the numbers through MongoDB before parsing them through Logstash. However, seeing as though I have 4GBs' worth of logfiles, I was hoping for a way to achieve this in one single swoop. I was wondering whether my edge case would be solvable with the ruby filter plugin, where I could possibly run some arbitrary ruby code to do the above?
Any help / advice would be greatly appreciated!
