Usecase for Elasticsearch for Hadoop

I'm trying to understand where Elasticsearch for Hadoop fits in the big
data landscape and why someone would use it.

  1. If you wan't all the data in Haddop searchable, doesn't that mean
    everything needs all the data duplicated in Elasticsearch (via es-hadoop)?

  2. Can you push all data from Elasticsearch into Hadoop whit es-hadoop,
    instead of the reverse?

Here's my idea:
Short-term (1 week) real-time searchable (kibana) data is kept in
Elasticsearch
Long-term (1 year+) high-latency searchable (hbase,pig et al.) data
kept in Hadoop

At a high level, does this make sense?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/73c55fe9-08ae-4c74-93be-98107aff4954%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Typically one would use es-hadoop if they are already Hadoop users. As for your questions:

  1. Yes and no. To search data one has to index but not necessarily store the data. For convenience so that the data is
    returned along with the results, let's assume the worst case scenario where data is stored as well. However one would
    have to do so as well even use a pure Hadoop implementation - letting aside the fact that one would have to write the
    search algos using Map/Reduce which is not at all easy (think Geolocation) - all the intermediate steps and keys (think
    shuffling, key/output values) between input and output, would be saved to disk which results in data being duplicated on
    each job.

Elasticseach aside, for data to be useable, searchable, indexed, etc... there needs to be some metadata - this is either
packed with the data or created along the way. Since you mentioned HBase and Pig, take a look at their requirements.

  1. Yes, es-hadoop is bidirectional so one can stream data in ES to from HDFS for example or stream data from ES to HDFS.
    However while ES can be used as a store, it's much more valuable if you use it for its search/insight capabilities hence
    why typically one would read search results from ES not just raw data.

If you haven't seen it so far, I recommend the latest webinar [1] which features es-hadoop and provides a complete
picture of what es-hadooop is.

Cheers,

[1] Elasticsearch Platform — Find real-time answers at scale | Elastic

On 9/22/14 4:05 AM, Nelson Jeppesen wrote:

I'm trying to understand where Elasticsearch for Hadoop fits in the big data landscape and why someone would use it.

  1. If you wan't all the data in Haddop searchable, doesn't that mean everything needs all the data duplicated in
    Elasticsearch (via es-hadoop)?

  2. Can you push all data from Elasticsearch into Hadoop whit es-hadoop, instead of the reverse?

Here's my idea:
Short-term (1 week) real-time searchable (kibana) data is kept in Elasticsearch
Long-term (1 year+) high-latency searchable (hbase,pig et al.) data kept in Hadoop

At a high level, does this make sense?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/73c55fe9-08ae-4c74-93be-98107aff4954%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/73c55fe9-08ae-4c74-93be-98107aff4954%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/541FBE33.6010709%40gmail.com.
For more options, visit https://groups.google.com/d/optout.