Indexing logs with es-hadoop

shubhamgupta1404 · July 21, 2015, 4:49am

I am new to elasticsearch and want to index my website logs which are stored on HDFS for fast querying. I have a well structured pipeline which runs a script every 20 minutes to ingest the data into HDFS. I want to integrate elasticsearch with it, so that it also indexes these logs based on particular field(s) and thereby giving faster query results using spark SQL. So, my question is, can I index my data based on particular field(s) only? Also, my logs are saved in avro file format. Does es provides a way to directly index avro serialized data or do I need to convert it into some other format?

Thank you in advance.

costin · July 22, 2015, 8:14am

Of course. Simply take a look at the docs. Note that ES thinks in terms of docs rather than fields (which is more of a RDBMS concept). In other words, you can simply throw the documents at it and be done with. The ES documentation explains the various indexing options you have including mapping (which looks like you need).

ES itself doesn't read the data; it is the Elasticsearch Hadoop connector that does this. And yes, it supports the Avro format. Once you have picked your library (Map/Reduce, Hive, Cascading, etc...) simply configure it to read the file just as you typically do in Hadoop and plug the connector on the other side to 'fan' the data out to Elasticsearch.

More about it at Elasticsearch for Hadoop | Elastic

Topic		Replies	Views
Query on Indexing using es-hadoop Elasticsearch es-hadoop	6	1989	July 6, 2017
Hadoop-Elasticsearch - Avro Support Elasticsearch es-hadoop	4	1944	July 6, 2017
How to push data from Hadoop to ES? Elasticsearch es-hadoop	6	4228	July 21, 2017
Index HDFS data Elasticsearch	3	1250	July 5, 2017
Xml files and HDFS Elasticsearch	4	341	July 6, 2017

Indexing logs with es-hadoop

Related topics