External lookups in a logstash pipeline

So I have a requirement that goes something like this ...

  1. An event arrives in kafka
  2. Read that event from logstash and get an ID out of it
  3. Look up that ID to fetch a JSON from say object storage
  4. Do stuff like filtering etc to it
  5. Index to Elasticsearch

I'm curious about #3 above. What is the best way to do something like that in a logstash pipeline? One possible way out is have a script execute as part of a filter that takes in the ID and dumps the JSON to pipeline after fetching it. But am wondering if logstash supports that.

Your best option would probably be a translate or a jdbc_streaming filter.

thanks for your response @magnusbaeck
Can you elaborate a bit more as to how a translate or a jdbc_streaming filter would help me lookup and fetch a JSON as an event into logstash?

Those filters do exactly what you're asking for; they look up a field value in an external data source and stores the result in a field in the current event. Feed the resulting field to a json filter to deserialize the string into fields in the current event.

How about object storage lookup - does jdbc_streaming handle that?

I'm not sure exactly what you mean, but any data source that you have a JDBC driver for will work.

@magnusbaeck Thanks for your response.
So I was referring to any cloud object storage.
Let me rephrase this - in case I need to lookup any system that may be doesnt have JDBC driver - im guessing it is going to have to be custom code execution.
So im guessing a custom filter or can the ruby filter help too?

I wouldn't use a Ruby filter for anything non-trivial, but theoretically it should work. I'd go with a custom filter.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.