How to migrate lucene index into elasticsearch

Thanks Jorg for the guidance and I have am trying the suggested approach #1
and I have further question on it.

As you mentioned - "- a custom written tool could traverse the segments
and extract field information and build a rudimentary mapping (without
analyzer, without info about _all and _source and all Elasticsearch
add-ons)".

We already have a Lucene Index metadata (i.e. field names, type, analyzer
etc.) available as an xml, so I can create the mapping without traversing
the segments. Should I create segment file "segments.gen" using the mapping
file and using some dummy values and then put all the other old lucene
index files ( except "segments.gen" ) from existing lucene index files
(e.g. - segments_2,_0.cfe,_0.cfs,_0.si,_1.cfe,_1.cfs etc.)

sample mapping xml file :-


true
Standard
AddressLine1
AddressLine1
true
string


true
Standard
Building_Name
Building_Name
true
string


true
Keyword
GNAF_PID
GNAF_PID
true
string

...

Thanks

On Thu, Nov 13, 2014 at 11:59 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

It is almost impossible to use just binary-only Lucene index for
migration, because Elasticsearch needs additional info which is not
available in Lucene. The only method is to reindex data over the
Elasticsearch API.

There is a bumpy road but I don't know if one ever tried that:

  • a custom written tool could traverse the segments and extract field
    information and build a rudimentary mapping (without analyzer, without info
    about _all and _source and all Elasticsearch add-ons)

  • another tool could try to reconstruct docs (like the tool Luke) and
    write them to a file in bulk format. Not having the source of the docs
    means it must be possible to retrieve the original input from the Lucene
    index (which is almost never the case)

  • the result could be re-indexed using the Elasticsearch API (assuming all
    analyzers and tokenizers are in place) but a lot of work would have to be
    done

The preferred way is to rewrite the code that uses the Lucene API to use
the Elasticsearch API and re-run the indexing process.

Jörg

On Thu, Nov 13, 2014 at 7:11 PM, Gaurav gupta gupta.gaurav0125@gmail.com
wrote:

Hi All,

I have an embedded Search Engine in our product which is based on Lucene
4.8.1 and now I would like to migrate it to latest Elasticsearch 1.4 for
better distributed support (sharding and replication, mainly). Could you
guide me how one should migrate the existing indexes created by Lucene to
ES.

I have referred to the mail thread - migrate lucene index into
elasticsearch
https://groups.google.com/forum/#!searchin/elasticsearch/migrating/elasticsearch/xCE7124eAL8/ZFluLXqO_IcJ.
And based on the discussion in it appears to me that it's not a easy job
or even not feasible. I am wondering if there is some plugin (river) or
tool or any work around available to migrate the existing indexes
created by Lucene to ES.

I googled that an ES plugin available for SOLR to ES migration :
Trifork Blog - Keep updated on the technical solutions Trifork is working on! .
Do we have someting similar for Lucene to ES migration.

Thanks
Gaurav

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/71c0ed2e-94d7-4b70-b581-2515856fd938%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/71c0ed2e-94d7-4b70-b581-2515856fd938%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE8%3D-6Ft0%3DQBW_%2BShF69WAVzz_Ti%3DtJZMogp%3DQjxF5suA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE8%3D-6Ft0%3DQBW_%2BShF69WAVzz_Ti%3DtJZMogp%3DQjxF5suA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALZAj3KDmA7NVZV2LcG2bcZpdOt%2Bz8%3D_2yuBw1PH1Z0odxz1kA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.