So I have a requirement that goes something like this ...
An event arrives in kafka
Read that event from logstash and get an ID out of it
Look up that ID to fetch a JSON from say object storage
Do stuff like filtering etc to it
Index to Elasticsearch
I'm curious about #3 above. What is the best way to do something like that in a logstash pipeline? One possible way out is have a script execute as part of a filter that takes in the ID and dumps the JSON to pipeline after fetching it. But am wondering if logstash supports that.
thanks for your response @magnusbaeck
Can you elaborate a bit more as to how a translate or a jdbc_streaming filter would help me lookup and fetch a JSON as an event into logstash?
Those filters do exactly what you're asking for; they look up a field value in an external data source and stores the result in a field in the current event. Feed the resulting field to a json filter to deserialize the string into fields in the current event.
@magnusbaeck Thanks for your response.
So I was referring to any cloud object storage.
Let me rephrase this - in case I need to lookup any system that may be doesnt have JDBC driver - im guessing it is going to have to be custom code execution.
So im guessing a custom filter or can the ruby filter help too?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.