I am trying to index data from Hive to ES. I have read the documentation online which basically tells you to first create a external table and then start inserting data into it. Now for inserting data into this table at a regular interval do we need to create a shell script or is there any other way?
It was easier getting data from oracle. Can we do this using logstash.config file?
Hive is specifically a batch processing tool, so if you want to schedule it to do periodic transfers to Elasticsearch you will need to either write some sort of script or use some sort of scheduling solution to run your queries regularly.
Logstash is a great way to move data to Elasticsearch if you are looking for more of a stream processing approach. As the data is available, your Logstash implementation could push it to Elasticsearch for immediate use. Feel free to poke around the Logstash documentation and the Logstash part of these forums. There's plenty of people around that are happy to help with any questions.
@james.baiera Thanks for the reply. I am using logstash to fetch data from oracle tables and logs. Now I need to fetch data from Hive tables for visualisation. I think there are two approaches for this since logstash won't work here.
To use ES hadoop and make a external table and then make a script which sends data to that table.
Make a script which would fetch data from Hive table to a csv file and then I will ftp it to ELK server where I will use logstash to get the data.
The second way saves me the pain of additional installation of Es hadoop jars etc.
I think either way would work. Depending on how much data you have in Hive, the ES-Hadoop route might be a bit faster, but if the tradeoff in terms of extra installations and updates is worth it, then by all means, use the approach that works best for you!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.