So I have a requirement that goes something like this ...
- An event arrives in kafka
- Read that event from logstash and get an ID out of it
- Look up that ID to fetch a JSON from say object storage
- Do stuff like filtering etc to it
- Index to Elasticsearch
I'm curious about #3 above. What is the best way to do something like that in a logstash pipeline? One possible way out is have a script execute as part of a filter that takes in the ID and dumps the JSON to pipeline after fetching it. But am wondering if logstash supports that.
Your best option would probably be a translate or a jdbc_streaming filter.
thanks for your response @magnusbaeck
Can you elaborate a bit more as to how a translate or a jdbc_streaming filter would help me lookup and fetch a JSON as an event into logstash?
Those filters do exactly what you're asking for; they look up a field value in an external data source and stores the result in a field in the current event. Feed the resulting field to a json filter to deserialize the string into fields in the current event.
How about object storage lookup - does jdbc_streaming handle that?
I'm not sure exactly what you mean, but any data source that you have a JDBC driver for will work.
@magnusbaeck Thanks for your response.
So I was referring to any cloud object storage.
Let me rephrase this - in case I need to lookup any system that may be doesnt have JDBC driver - im guessing it is going to have to be custom code execution.
So im guessing a custom filter or can the ruby filter help too?
I wouldn't use a Ruby filter for anything non-trivial, but theoretically it should work. I'd go with a custom filter.