Hi All,
I am looking for some advice for ETL options for a project that I am currently working on. Our data sets are generally things like (person, vehicle, dates, times, locations) … This data can come in multiple data formats (but mainly oracle at the start):
Oracle, CSV, Logs, Json, etc
We need to be able to modify this data on the fly (e.g. Convert date formats, modify strings, etc) before sending to elasticsearch
So far we have looked at:
- Python Client, but could just as easily use the Ruby/PHP/Perl
- Logstash with GROK (I also see there is a Ruby Plugin for logstash I could use there to give me the ability to modify the data as it is ingested)
My Question: Are there other better tools available? Or what is the best practice advised for this type of ETL process?
We will need Kerberos to authenticate to Oracle
Also if there was a tool that we could use for scheduling that would be useful also, if not we can always use cron but this is a quiet manual
Thanks very much for any help in advance