I am using elasticsearch-hadoop plugin inside hive to insert record from hive to elasticsearch. I need to get hold of insertion logs and if possible some way to monitor how the insertion is going. I checked elasticsearch-hadoop plugin documentation, but not able to understand how to check the logs.
If some body has used these tool, please help me in finding how to monitor execution.
Here is the documentation link i referred. https://www.elastic.co/guide/en/elasticsearch/hadoop/current/logging.html https://www.elastic.co/guide/en/elasticsearch/hadoop/current/metrics.html
For finding the actual logs, you will have to check how your logging is configured for Hive. ES-Hadoop will log messages in two locations depending on what is being done. The first logging location will be on the HiveServer for any job configuration and simple job execution. The second logging location will be on the executors that Hive spins up on your cluster to perform more complicated distributed operations. If enabled, the logs will most certainly be mixed in with the regular Hive logs, so you'll have to dig a bit to get at them.
I found the logs, where it was giving information about the elasticsearch-hadoop metrics. In my case, I am using hive on top of AWS EMR.
The metric logs are found at location
Find the application id of your insert into elasticsearch operation from hive-server2.log.
Go to aws console, and open emr console.
Open your emr cluster.
Open LOGURI containing log files under s3.
Go to node//applications/hive/hive-server2.out.gz
Find the application id of your step.
Now go to containers//<container_id>/syslog_attempt...
This file will give you the metrics results.
There can be many application id depending upon many queries run by hive as well as many container id depending upon how much parallel execution is happening.
Thanks for sharing your finds here!
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.