Elastic search for Hadoop


(Hiro Gangwani) #1

Hi,
I am looking at latest release of elasticsearch 0.97 version where support
is provided for native integration with Hadoop cluster. I am looking for
following artifacts.

  1. I have set up of hadoop cluster with apache 2.0 version. How to do set
    up of elasticsearch and how to integrate both the components.

  2. What will be integration points between HDFS and elasticsearch.

  3. I am using Java API's to index, store and search the data stored in
    elasticsearch. Will there be any changes in the APIs used currently?

  4. Do we have example of how to index the data and search using Java API
    for elasticsearch-hadoop installations.

Thanks,

Hiro

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/837cc6c5-dfd3-4b53-af87-c06788f84c24%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Costin Leau) #2

Hi,

Elastisearch Hadoop is a separate project from Elasticsearch itself. You can find answers to the questions below by
reading the docs which can be found at [1]. I've answered your questions below to give you a head start:

On 03/12/2013 11:29 AM, Hiro Gangwani wrote:

Hi,
I am looking at latest release of elasticsearch 0.97 version where support is provided for native integration with
Hadoop cluster. I am looking for following artifacts.

  1. I have set up of hadoop cluster with apache 2.0 version. How to do set up of elasticsearch and how to integrate both
    the components.

Elasticsearch Hadoop (or es-hadoop) supports both Hadoop 1.x and Hadoop 2 or YARN. We provide a dedicated binary for the
latter - more info in the docs in the installation chapter.

  1. What will be integration points between HDFS and elasticsearch.

You can index data from HDFS using the various frameworks in Hadoop. The reverse is also true - you can run jobs that
source from Elasticsearch and write to HDFS.
In the upcoming M2 we will provide snapshot/restore functionality on HDFS as well.

  1. I am using Java API's to index, store and search the data stored in elasticsearch. Will there be any changes in the
    APIs used currently?

This is question for Elasticsearch itself not es-hadoop and the 'official' backwards compatibility rules apply - the
0.90 is stable so no breaking changes will apply however 1.x is a major release so some changes will occur. For the most
part though, things will be the same.

  1. Do we have example of how to index the data and search using Java API for elasticsearch-hadoop installations.

es-hadoop does not use the Java (or the transport) - it relies on the REST API. Again, I recommend reading the docs
which feature plenty of examples whether it's reading or writing to/from Elasticsearch using the various Hadoop
API/frameworks supported.

Cheers,

Thanks,

Hiro

[1] http://elasticsearch.org/hadoop

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/837cc6c5-dfd3-4b53-af87-c06788f84c24%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/529DAB1C.4020901%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Hiro Gangwani) #3

Hi Costin,

Thanks for quick reply. My requirements is as follows.

  1. I am trying to store elastiscsearch indexes of document data in hadoop
    distributed file system (HDFS) across cluster of servers.
  2. At the same time i want to search the data and retrieve the data from
    hadoop cluster using Java API.

Currently i am using the Java API's to store the index data on local file
system. Searching and retrieving the data using BoolQueryBuilder and other
API's using java language. How do i perform similar tasks if my indexes are
stored in HDFS in hadoop cluster.Does es-hadoop provides support to index
and store data in hadoop cluster (HDFS) ?

Thanks,

Hiro.

On Tuesday, 3 December 2013 14:59:35 UTC+5:30, Hiro Gangwani wrote:

Hi,
I am looking at latest release of elasticsearch 0.97 version where support
is provided for native integration with Hadoop cluster. I am looking for
following artifacts.

  1. I have set up of hadoop cluster with apache 2.0 version. How to do set
    up of elasticsearch and how to integrate both the components.

  2. What will be integration points between HDFS and elasticsearch.

  3. I am using Java API's to index, store and search the data stored in
    elasticsearch. Will there be any changes in the APIs used currently?

  4. Do we have example of how to index the data and search using Java API
    for elasticsearch-hadoop installations.

Thanks,

Hiro

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8ff3d4f3-f978-4dd2-b6c4-76e6719a3b6e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Costin Leau) #4
  1. Storing files on HDFS currently falls outside es-hadoop and Elasticsearch. You can map HDFS as a NFS or other local
    file-system and ES can simply use that. Note however that it will severely impact performance - HDFS is a distributed
    file-system which can easily take seconds in locating data as each resource call can easily translate to multiple calls
    over the network.
    In effect, this means that calls to ES will be slowed down as well as every access to disk will translate to a call to HDFS.

  2. if you want to search data from Hadoop, you need to index it with Elasticsearch. Es-Hadoop can help you with that.
    Once the data is in ES you can access it through es-hadoop (if you want to access in from Hadoop jobs) or from any other
    ES client - whether it's through Java API, other languages or plain REST.

On 03/12/2013 12:29 PM, Hiro Gangwani wrote:

Hi Costin,

Thanks for quick reply. My requirements is as follows.

  1. I am trying to store elastiscsearch indexes of document data in hadoop distributed file system (HDFS) across cluster
    of servers.
  2. At the same time i want to search the data and retrieve the data from hadoop cluster using Java API.

Currently i am using the Java API's to store the index data on local file system. Searching and retrieving the data
using BoolQueryBuilder and other API's using java language. How do i perform similar tasks if my indexes are stored in
HDFS in hadoop cluster.Does es-hadoop provides support to index and store data in hadoop cluster (HDFS) ?

Thanks,

Hiro.

On Tuesday, 3 December 2013 14:59:35 UTC+5:30, Hiro Gangwani wrote:

Hi,
I am looking at latest release of elasticsearch 0.97 version where support is provided for native integration with
Hadoop cluster. I am looking for following artifacts.

1. I have set up of hadoop cluster with apache 2.0 version. How to do set up of elasticsearch and how to integrate
both the components.

2. What will be integration points between HDFS and elasticsearch.

3. I am using Java API's to index, store and search the data stored in elasticsearch. Will there be any changes in
the APIs used currently?

4. Do we have example of how to index the data and search using Java API for elasticsearch-hadoop installations.

Thanks,

Hiro

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8ff3d4f3-f978-4dd2-b6c4-76e6719a3b6e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/529DB428.10504%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Hiro Gangwani) #5

Hi Costin,

I am precisely looking for 2nd option. I want to index the data using ES
and store in hadoop cluster. On top of what i wanted to search based upon
BoolQueryBuilder queries using Java API client. Is it possible to share
some examples for these scenarios. es-hadoop tutorial does not provide any
material on the same. like how to index the data using ES and store in
Hadoop and retrieve programs using Java Client.

Hiro.

On Tuesday, 3 December 2013 16:06:24 UTC+5:30, Costin Leau wrote:

  1. Storing files on HDFS currently falls outside es-hadoop and
    Elasticsearch. You can map HDFS as a NFS or other local

file-system and ES can simply use that. Note however that it will severely
impact performance - HDFS is a distributed
file-system which can easily take seconds in locating data as each
resource call can easily translate to multiple calls
over the network.
In effect, this means that calls to ES will be slowed down as well as
every access to disk will translate to a call to HDFS.

  1. if you want to search data from Hadoop, you need to index it with
    Elasticsearch. Es-Hadoop can help you with that.
    Once the data is in ES you can access it through es-hadoop (if you want to
    access in from Hadoop jobs) or from any other
    ES client - whether it's through Java API, other languages or plain REST.

On 03/12/2013 12:29 PM, Hiro Gangwani wrote:

Hi Costin,

Thanks for quick reply. My requirements is as follows.

  1. I am trying to store elastiscsearch indexes of document data in
    hadoop distributed file system (HDFS) across cluster
    of servers.
  2. At the same time i want to search the data and retrieve the data from
    hadoop cluster using Java API.

Currently i am using the Java API's to store the index data on local
file system. Searching and retrieving the data
using BoolQueryBuilder and other API's using java language. How do i
perform similar tasks if my indexes are stored in
HDFS in hadoop cluster.Does es-hadoop provides support to index and
store data in hadoop cluster (HDFS) ?

Thanks,

Hiro.

On Tuesday, 3 December 2013 14:59:35 UTC+5:30, Hiro Gangwani wrote:

Hi, 
I am looking at latest release of elasticsearch 0.97 version where 

support is provided for native integration with

Hadoop cluster. I am looking for following artifacts. 

1. I have set up of hadoop cluster with apache 2.0 version. How to 

do set up of elasticsearch and how to integrate

both the components. 

2. What will be integration points between HDFS and elasticsearch. 

3. I am using Java API's to index, store and search the data stored 

in elasticsearch. Will there be any changes in

the APIs used currently? 

4. Do we have example of how to index the data and search using Java 

API for elasticsearch-hadoop installations.

Thanks, 

Hiro 

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to
elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit

https://groups.google.com/d/msgid/elasticsearch/8ff3d4f3-f978-4dd2-b6c4-76e6719a3b6e%40googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/892e63d5-58dd-412f-bf2d-6cae0d7276af%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Costin Leau) #6

I'm afraid I'll just repeat myself.
For storing data in HDFS, simply expose HDFS as a local file system and configure ES to store data in it.
If you want to import data from HDFS to ES, you can use es-hadoop and its reference documentation has examples that
cover that. Of course, you can also do this yourself without es-hadoop.

Once the data is in Elasticsearch, you can query it as you did before - it doesn't matter where the data is stored,
that's something handled by Elasticsearch internally.

On 03/12/2013 2:16 PM, Hiro Gangwani wrote:

Hi Costin,

I am precisely looking for 2nd option. I want to index the data using ES and store in hadoop cluster. On top of what i
wanted to search based upon BoolQueryBuilder queries using Java API client. Is it possible to share some examples for
these scenarios. es-hadoop tutorial does not provide any material on the same. like how to index the data using ES and
store in Hadoop and retrieve programs using Java Client.

Hiro.

On Tuesday, 3 December 2013 16:06:24 UTC+5:30, Costin Leau wrote:

1. Storing files on HDFS currently falls outside es-hadoop and Elasticsearch. You can map HDFS as a NFS or other local

file-system and ES can simply use that. Note however that it will severely impact performance -  HDFS is a distributed
file-system which can easily take seconds in locating data as each resource call can easily translate to multiple calls
over the network.
In effect, this means that calls to ES will be slowed down as well as every access to disk will translate to a call
to HDFS.

2. if you want to search data from Hadoop, you need to index it with Elasticsearch. Es-Hadoop can help you with that.
Once the data is in ES you can access it through es-hadoop (if you want to access in from Hadoop jobs) or from any
other
ES client - whether it's through Java API, other languages or plain REST.

On 03/12/2013 12:29 PM, Hiro Gangwani wrote:
> Hi Costin,
>
> Thanks for quick reply. My requirements is as follows.
>
> 1. I am trying to store elastiscsearch indexes of document data in hadoop distributed file system (HDFS) across cluster
> of servers.
> 2. At the same time i want to search the data and retrieve the data from hadoop cluster using Java API.
>
> Currently i am using the Java API's to store the index data on local file system. Searching and retrieving the data
> using BoolQueryBuilder and other API's using java language. How do i perform similar tasks if my indexes are stored in
> HDFS in hadoop cluster.Does es-hadoop provides support to index and store data in hadoop cluster (HDFS) ?
>
> Thanks,
>
> Hiro.
>
>
> On Tuesday, 3 December 2013 14:59:35 UTC+5:30, Hiro Gangwani wrote:
>
>     Hi,
>     I am looking at latest release of elasticsearch 0.97 version where support is provided for native integration with
>     Hadoop cluster. I am looking for following artifacts.
>
>     1. I have set up of hadoop cluster with apache 2.0 version. How to do set up of elasticsearch and how to integrate
>     both the components.
>
>     2. What will be integration points between HDFS and elasticsearch.
>
>     3. I am using Java API's to index, store and search the data stored in elasticsearch. Will there be any changes in
>     the APIs used currently?
>
>     4. Do we have example of how to index the data and search using Java API for elasticsearch-hadoop installations.
>
>     Thanks,
>
>     Hiro
>
> --
> You received this message because you are subscribed to the Google Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
>elasticsearc...@googlegroups.com <javascript:>.
> To view this discussion on the web visit
>https://groups.google.com/d/msgid/elasticsearch/8ff3d4f3-f978-4dd2-b6c4-76e6719a3b6e%40googlegroups.com
<https://groups.google.com/d/msgid/elasticsearch/8ff3d4f3-f978-4dd2-b6c4-76e6719a3b6e%40googlegroups.com>.
> For more options, visithttps://groups.google.com/groups/opt_out <https://groups.google.com/groups/opt_out>.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/892e63d5-58dd-412f-bf2d-6cae0d7276af%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/529DCE89.1040509%40gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #7