How to read JSON files stored in HDFS via Logstash

Hello Vinicius,

I didn't get any suggestions from this forum yet. But I figured two ways to achieve this.

Method #1:

If you have looked at the currently available output plug-ins for logstash, there is "webhdfs" plug-in which utilizes the HDFS's REST API over HTTP.

I believe we can use this same API to read from HDFS using logstash HTTP input plug-in available. By specifying the files location inside the HDFS and read it by HDFS REST API and process it, push it to ES.

Method #2: (The one I'm using in production now)

Using HIVE to push data directly to ES without using logstash. I'm using this method currently in my production cluster. There is JAR integration available to push data to ES.

But, check the compatibility and transform features available with-in HIVE before you use this method. If you use HIVE, you may not get all the transformation/manipulation methods available in logstash.

Here are few issues I went through while using HIVE with ES:

I'm also using HIVE's pre-defined functions as well as User Defined Functions to transform my data and push them to ES.

I believe these information may helped you to get started :slight_smile:

Regards.