Elastic Search with Cascading

I am processing huge data with cascading and dumping it into HDFS. Now I want to perform search on this data. So i thought of these solutions,

  1. I am using ESTap() of elastic search as sink tap to dump data and then use ElasticSearch for searching this data.
    But the time taken for dumping data is too high. (as compared to time taken by default HFS Sink tap).

  2. Should i dump the data normally with HFS sink tap of cascading into HDFS and then use the es-hadoop to move data to elastic search for search operations ?

Please tell me which is the best method for processing huge data? Also which approach is correct or wrong.??
Thanks in advance.

Please try to avoid opening multiple threads on the same topic. Thank you!

To be honest, there's no definitive "right" or "wrong" answer here. It entirely revolves around what your own data requirements are, if you want it to be checkpointed to HDFS first before shipping to Elasticsearch or if you prefer a more direct approach. I will say that if you do not need the data in HDFS, you will be incurring write and read overhead for the job. The more important issue that you have in front of you is tuning the write speed for your connector and cluster.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.