How to read JSON files stored in HDFS via Logstash


(Gowtham Sadasivam) #1

I have my data (JSON Files) stored in HDFS and would like to push these data to ElasticSearch. I have seen output plugin for HDFS but not any input plugin. Is there a way to read the HDFS files via Logstash? or any other way to push the HDFS data to ElasticSearch other than Logstash?

Any help/recommendation is appreciated. Thanks in advance :slight_smile:


(Vinicius Silva) #2

Did you get an answer on this ? I have the same issue.


(Gowtham Sadasivam) #3

Hello Vinicius,

I didn't get any suggestions from this forum yet. But I figured two ways to achieve this.

Method #1:

If you have looked at the currently available output plug-ins for logstash, there is "webhdfs" plug-in which utilizes the HDFS's REST API over HTTP.

I believe we can use this same API to read from HDFS using logstash HTTP input plug-in available. By specifying the files location inside the HDFS and read it by HDFS REST API and process it, push it to ES.

Method #2: (The one I'm using in production now)

Using HIVE to push data directly to ES without using logstash. I'm using this method currently in my production cluster. There is JAR integration available to push data to ES.

But, check the compatibility and transform features available with-in HIVE before you use this method. If you use HIVE, you may not get all the transformation/manipulation methods available in logstash.

Here are few issues I went through while using HIVE with ES:

I'm also using HIVE's pre-defined functions as well as User Defined Functions to transform my data and push them to ES.

I believe these information may helped you to get started :slight_smile:

Regards.


(Vinicius Silva) #4

Thanks for the reply @gowthamsadasivam. I made a work around to avoid reading from HDFS :slight_smile:
But I will keep your suggestions in mind in case I need to change.


(system) #5