What is the best way to collect yarn application logs from hdfs?

Drahkar · March 24, 2019, 2:09pm

We have a dynamic infrastructure for Hadoop, which means the yarn application logs only exist for a limited period of time.

Currently we are using Splunk HadoopConnect to ingest those logs as a live feed into Splunk. However this requires the installation of the full Splunk server on the cluster to accomplish, which is not only resource intensive, but not the most efficient thing to do every time we spin up a dynamic Hadoop cluster.

Does Elasticsearch, Logstash, etc have an alternative to HadoopConnect that could be used to collect the Yarn application logs out of HDFS and feed them into the ELK stack?

james.baiera · April 5, 2019, 8:37pm

I don't have much experience with Splunk's HadoopConnect or know much about what it even is, but in terms of collecting log files, I would suggest something like starting a Filebeat along side your NodeManager instances as they come online and tearing it down as they come offline.

Granted, this means running a data shipper process on all the nodes that you would want to collect data from (my recollection is that you can get the YARN application logs from the local directories on the NodeManagers/ResourceManagers that launch them, but your setup may be different than what I'm used to).

Additionally, I know of a tool that one of the engineers at Elastic has made public called FSCrawler, which has a blurb about indexing data through an HDFS NFS Gateway. Maybe that would be easier to implement instead of using a sidecar datashipper on dynamic infrastructure?

system · May 3, 2019, 8:37pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Monitor YARN Application jobs Elasticsearch es-hadoop	3	1923	January 9, 2019
Accessing ES in Hadoop Elasticsearch	4	547	July 6, 2017
Migration elk to hadoop Elasticsearch es-hadoop	2	1572	July 6, 2018
Hadoop/Mapr and ELK Elasticsearch es-hadoop	2	1280	December 12, 2019
HDFS as elastic search data repository Elasticsearch	7	1029	July 5, 2017

What is the best way to collect yarn application logs from hdfs?

Related topics