Enrich Elasticsearch via MySQL based on field value

Hi,

I'm importing a ton of events into elasticsearch via logstash from a S3 bucket with logs.

Based on a specific UUID in every event I would want to fetch the aggregated info from a MySQL database and update/enrich that same event with that extra data, so it can be used later on Kibana.

Is there a way I can accomplish this?

TIA

Hi,
I'm afraid I'd need some more info to be able to help you. Can you post what the data looks like and what you want to do with it exactly please?

Hi @javanna

Well, our log entries look something like this:

web	2016-06-23 15:17:55.612	2016-06-23 14:59:53.000		unstruct	9f8aaa2f-b24f-4c8b-8ca6-f6289f072a85			custom	clj-1.1.0-tom-0.2.0	hadoop-1.6.0-common-0.21.0		93.XXX.243.XXX				07eae7d2-04ae-464f-8cd0-98a917bb9975												http://xxx.com/test			http	xx.com	80	/test																							{"schema":"iglu:com.snowplowanalytics.snowplow/unstruct_event/jsonschema/1-0-0","data":{"schema":"iglu:com.xxx/open/jsonschema/1-0-1","data":{"cid":"2253450","eid":"2231323","uid":"21"}}}																			Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.103 Safari/537.36																																															2016-06-23 14:59:53.000	com.xxx	open	jsonschema	1-0-1		

Within logstash we're able to parse all the parameters, including the ones on jsonschema.
So our idea is, based on cid, eid, and uid values passed, being able to lookup a MySQL database and retrieve some extra info that allow us to enrich the event data. Let's say gender, age, anything else... that would allow us to do more consistent reports on Kibana.

I've found a plugin for Logstash the would do just that but is yet to be developed.

Right now there's no built-in functionality to handle jdbc at the filter stage. Currently there's a community-made jdbc filter by one of the most involved contributors to logstash, but I haven't tested it myself.

An alternative, that I used to do in my last job, is to use a zeromq filter and a small application that speaks zeromq and executes queries using a ruby library like sequel.