ElasticSearch Hadoop


(James Cook-2) #1

I've read through much of the documentation for es-hadoop, but I might be
coming away with some misunderstandings.

The setup docs for elasticsearch for apache hadoop (es-hadoop) uses the
word interact which is a bit vague.

Elasticsearch for Apache Hadoop is an open-source, stand-alone,

self-contained, small library that allows Hadoop jobs (whether using
Map/Reduce or libraries built upon it such as Hive, Pig or Cascading) to
interact with Elasticsearch. Data flows bi-directionaly so that
applications can leverage transparently the Elasticsearch engine
capabilities to significantly enrich their capabilities and increase the
performance.

So, does this mean I have a separate Hadoop instance (potentially built
upon HDFS or AWS EMR) and I can query data using either the elasticsearch
(REST/Java/etc) or hadoop (Hive, Pig, Cascading) environments?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c12c35a0-b9ee-461e-8e81-12910dd06894%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Costin Leau) #2

On 7/17/14 8:38 PM, James Cook wrote:

I've read through much of the documentation for es-hadoop, but I might be coming away with some misunderstandings.

The setup docs for elasticsearch for apache hadoop (es-hadoop) uses the word /interact/ which is a bit vague.

Elasticsearch for Apache Hadoop is an open-source, stand-alone, self-contained, small library that allows Hadoop
jobs (whether using Map/Reduce or libraries built upon it such as Hive, Pig or Cascading) to interact with
Elasticsearch. Data flows bi-directionaly so that applications can leverage transparently the Elasticsearch engine
capabilities to significantly enrich their capabilities and increase the performance.

So, does this mean I have a separate Hadoop instance (potentially built upon HDFS or AWS EMR) and I can query data using
either the elasticsearch (REST/Java/etc) or hadoop (Hive, Pig, Cascading) environments?

I'm not sure I understand your question. es-hadoop allows Hadoop jobs (whether they are written in
Cascading/Hive/Pig/MR) to
read/write (to be read interact) easily to/from Elasticsearch. es-hadoop provides native APIs to the aforementioned
libraries
and underneath takes care of the boiler-plate work (conversion to/from JSON, communicating with ES, handling failures) plus
adds some 'optimizations' such as using Hadoop multi-node/parallel tasks.
Note that it is entirely possible to do all these yourself (if you don't want/cannot use it).

Cheers,

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c12c35a0-b9ee-461e-8e81-12910dd06894%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c12c35a0-b9ee-461e-8e81-12910dd06894%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53C81530.5000009%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3