Indexing Lucene documents


(IvanBrusic) #1

I am looking to transition an existing application from Lucene to
ElasticSearch. The main reasons are to support multiple indexes (one
per user) and have a service oriented architecture instead of having
embedded Lucene instances since there will be more than one
application using the search service. The current index is
(relatively) small, but will grow soon. Sharding might not be
necessary, but have high availability is always a plus.

Back on topic: there is currently logic in place to translate a
MongoDB document into a Lucene document. Given the schema-less nature
of Mongo, there is quite a bit of code for the translation (around 500
LOC). I am using the Java API to communicate to an external (but
still localhost) ES instance, not REST (for now). ES ultimately
creates a Lucene document for the JSON document that is past in, so I
was wondering if there was a way to bypass that step and send a Lucene
document? I do see a DocumentBuilder class, but I do not see it used
anywhere in the code (using 0.12.1).

I can always iterate through the document fields and call field(name,
value), but wondering if is was necessary. Could always convert the
existing logic in a JSON translation as well.

Cheers,

Ivan


(Shay Banon) #2

Hi,

You will need to create the json document. Passing a Lucene Document is
not really an option since there is metadata associated with the mapping
that is built during the parsing of the json.

-shay.banon

On Mon, Nov 8, 2010 at 6:19 PM, Ivan Brusic ivan_brusic@yahoo.com wrote:

I am looking to transition an existing application from Lucene to
ElasticSearch. The main reasons are to support multiple indexes (one
per user) and have a service oriented architecture instead of having
embedded Lucene instances since there will be more than one
application using the search service. The current index is
(relatively) small, but will grow soon. Sharding might not be
necessary, but have high availability is always a plus.

Back on topic: there is currently logic in place to translate a
MongoDB document into a Lucene document. Given the schema-less nature
of Mongo, there is quite a bit of code for the translation (around 500
LOC). I am using the Java API to communicate to an external (but
still localhost) ES instance, not REST (for now). ES ultimately
creates a Lucene document for the JSON document that is past in, so I
was wondering if there was a way to bypass that step and send a Lucene
document? I do see a DocumentBuilder class, but I do not see it used
anywhere in the code (using 0.12.1).

I can always iterate through the document fields and call field(name,
value), but wondering if is was necessary. Could always convert the
existing logic in a JSON translation as well.

Cheers,

Ivan


(system) #3