Indexing in Hadoop


(camechis) #1

I am attempting to create a M/R job that reads data from Hbase,
creates indexes via lucene and save the index to hdfs/local. I am
trying to figure out the integration point with elastic search. Do I
have to do something different for elastic search? How does es read
the indexes and auto shard it? I am still some one new with hadoop
and I am just looking into elastic search.


(Otis Gospodnetić) #2

Hello,

I think you should think about this a little differently. For
example, think about sending documents formed from data in HBase
directly to ES via its API instead of thinking how to index with
Lucene. When you do that, you'll learn answers to your questions as
you learn about using the ES API to index data.

In terms or reading data from HBase, you could start by looking at
HBase's Export MR job.
For indexing to ES: http://www.elasticsearch.org/guide/reference/api/index_.html

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Hadoop - HBase
Lucene ecosystem search :: http://search-lucene.com/

On Nov 18, 4:13 pm, camechis camec...@gmail.com wrote:

I am attempting to create a M/R job that reads data from Hbase,
creates indexes via lucene and save the index to hdfs/local. I am
trying to figure out the integration point with elastic search. Do I
have to do something different for elastic search? How does es read
the indexes and auto shard it? I am still some one new with hadoop
and I am just looking into elastic search.


(camechis) #3

Ok, I think I understand based on some other things I found.
Basically read the records out of HBase and form some type of
document ( JSON, etc ). Once the document is created post it to the
elastic search cluster via the JAVA api.
On Nov 19, 10:38 pm, Otis Gospodnetic otis.gospodne...@gmail.com
wrote:

Hello,

I think you should think about this a little differently. For
example, think about sending documents formed from data in HBase
directly to ES via its API instead of thinking how to index with
Lucene. When you do that, you'll learn answers to your questions as
you learn about using the ES API to index data.

In terms or reading data from HBase, you could start by looking at
HBase's Export MR job.
For indexing to ES:http://www.elasticsearch.org/guide/reference/api/index_.html

Otis

Sematext ::http://sematext.com/:: Solr - Lucene - Hadoop - HBase
Lucene ecosystem search ::http://search-lucene.com/

On Nov 18, 4:13 pm, camechis camec...@gmail.com wrote:

I am attempting to create a M/R job that reads data from Hbase,
creates indexes via lucene and save the index to hdfs/local. I am
trying to figure out the integration point with elastic search. Do I
have to do something different for elastic search? How does es read
the indexes and auto shard it? I am still some one new with hadoop
and I am just looking into elastic search.


(system) #4