Elastic Search with Cascading

Kunal_Ghosh · July 14, 2017, 2:52pm

I am processing huge data with cascading and dumping it into HDFS. Now I want to perform search on this data. So i thought of these solutions,

I am using ESTap() of elastic search as sink tap to dump data and then use ElasticSearch for searching this data.
But the time taken for dumping data is too high. (as compared to time taken by default HFS Sink tap).
Should i dump the data normally with HFS sink tap of cascading into HDFS and then use the es-hadoop to move data to elastic search for search operations ?

Please tell me which is the best method for processing huge data? Also which approach is correct or wrong.??
Thanks in advance.

james.baiera · July 17, 2017, 2:35am

Please try to avoid opening multiple threads on the same topic. Thank you!

To be honest, there's no definitive "right" or "wrong" answer here. It entirely revolves around what your own data requirements are, if you want it to be checkpointed to HDFS first before shipping to Elasticsearch or if you prefer a more direct approach. I will say that if you do not need the data in HDFS, you will be incurring write and read overhead for the job. The more important issue that you have in front of you is tuning the write speed for your connector and cluster.

Topic		Replies	Views
Save and search data with es & hadoop Elasticsearch es-hadoop	3	1288	December 18, 2015
How should I search data in hdfs Elasticsearch es-hadoop	2	1921	April 5, 2016
Usecase for Elasticsearch for Hadoop Elasticsearch	1	363	September 22, 2014
Ingesting data from HDFS to ElasticSearch Elasticsearch	2	3803	January 18, 2017
How is Hadoop and ES typically used? Elasticsearch es-hadoop	7	1778	July 25, 2015

Elastic Search with Cascading

Related topics