I am using elasticsearch-hadoop plugin inside hive to insert record from hive to elasticsearch. I need to get hold of insertion logs and if possible some way to monitor how the insertion is going. I checked elasticsearch-hadoop plugin documentation, but not able to understand how to check the logs.
If some body has used these tool, please help me in finding how to monitor execution.
For finding the actual logs, you will have to check how your logging is configured for Hive. ES-Hadoop will log messages in two locations depending on what is being done. The first logging location will be on the HiveServer for any job configuration and simple job execution. The second logging location will be on the executors that Hive spins up on your cluster to perform more complicated distributed operations. If enabled, the logs will most certainly be mixed in with the regular Hive logs, so you'll have to dig a bit to get at them.
I found the logs, where it was giving information about the elasticsearch-hadoop metrics. In my case, I am using hive on top of AWS EMR.
The metric logs are found at location
Find the application id of your insert into elasticsearch operation from hive-server2.log.
Go to aws console, and open emr console.
Open your emr cluster.
Open LOGURI containing log files under s3.
Go to node//applications/hive/hive-server2.out.gz
Find the application id of your step.
Now go to containers//<container_id>/syslog_attempt...
This file will give you the metrics results.
There can be many application id depending upon many queries run by hive as well as many container id depending upon how much parallel execution is happening.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.