Usecase for Elasticsearch for Hadoop

Nelson_Jeppesen · September 22, 2014, 1:05am

I'm trying to understand where Elasticsearch for Hadoop fits in the big
data landscape and why someone would use it.

If you wan't all the data in Haddop searchable, doesn't that mean
everything needs all the data duplicated in Elasticsearch (via es-hadoop)?
Can you push all data from Elasticsearch into Hadoop whit es-hadoop,
instead of the reverse?

Here's my idea:
Short-term (1 week) real-time searchable (kibana) data is kept in
Elasticsearch
Long-term (1 year+) high-latency searchable (hbase,pig et al.) data
kept in Hadoop

At a high level, does this make sense?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/73c55fe9-08ae-4c74-93be-98107aff4954%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

costin · September 22, 2014, 6:14am

Typically one would use es-hadoop if they are already Hadoop users. As for your questions:

Yes and no. To search data one has to index but not necessarily store the data. For convenience so that the data is
returned along with the results, let's assume the worst case scenario where data is stored as well. However one would
have to do so as well even use a pure Hadoop implementation - letting aside the fact that one would have to write the
search algos using Map/Reduce which is not at all easy (think Geolocation) - all the intermediate steps and keys (think
shuffling, key/output values) between input and output, would be saved to disk which results in data being duplicated on
each job.

Elasticseach aside, for data to be useable, searchable, indexed, etc... there needs to be some metadata - this is either
packed with the data or created along the way. Since you mentioned HBase and Pig, take a look at their requirements.

Yes, es-hadoop is bidirectional so one can stream data in ES to from HDFS for example or stream data from ES to HDFS.
However while ES can be used as a store, it's much more valuable if you use it for its search/insight capabilities hence
why typically one would read search results from ES not just raw data.

If you haven't seen it so far, I recommend the latest webinar [1] which features es-hadoop and provides a complete
picture of what es-hadooop is.

Cheers,

[1] Elasticsearch Platform — Find real-time answers at scale | Elastic

On 9/22/14 4:05 AM, Nelson Jeppesen wrote:

I'm trying to understand where Elasticsearch for Hadoop fits in the big data landscape and why someone would use it.

If you wan't all the data in Haddop searchable, doesn't that mean everything needs all the data duplicated in
Elasticsearch (via es-hadoop)?

Can you push all data from Elasticsearch into Hadoop whit es-hadoop, instead of the reverse?

Here's my idea:
Short-term (1 week) real-time searchable (kibana) data is kept in Elasticsearch
Long-term (1 year+) high-latency searchable (hbase,pig et al.) data kept in Hadoop

At a high level, does this make sense?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/73c55fe9-08ae-4c74-93be-98107aff4954%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/73c55fe9-08ae-4c74-93be-98107aff4954%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/541FBE33.6010709%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Elastic search & hadoop for analytics Elasticsearch	1	322	July 6, 2017
Can i search data on hadoop using elasticsearch? Elasticsearch es-hadoop	3	874	October 1, 2019
How is Hadoop and ES typically used? Elasticsearch es-hadoop	8	1713	July 6, 2017
Query on Indexing using es-hadoop Elasticsearch es-hadoop	6	1957	July 6, 2017
Hadoop / Elasticsearch functionality Elasticsearch es-hadoop	20	3237	July 6, 2017

Usecase for Elasticsearch for Hadoop

Related topics