How to migrate lucene index into elasticsearch

Hi All,

I have an embedded Search Engine in our product which is based on Lucene
4.8.1 and now I would like to migrate it to latest ElasticSearch 1.4 for
better distributed support (sharding and replication, mainly). Could you
guide me how one should migrate the existing indexes created by Lucene to
ES.

I have referred to the mail thread - migrate lucene index into elasticsearch
https://groups.google.com/forum/#!searchin/elasticsearch/migrating/elasticsearch/xCE7124eAL8/ZFluLXqO_IcJ.
And based on the discussion in it appears to me that it's not a easy job
or even not feasible. I am wondering if there is some plugin (river) or
tool or any work around available to migrate the existing indexes created
by Lucene to ES.

I googled that an ES plugin available for SOLR to ES migration :
http://blog.trifork.com/2013/01/29/migrating-apache-solr-to-elasticsearch/ .
Do we have someting similar for Lucene to ES migration.

Thanks
Gaurav

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/71c0ed2e-94d7-4b70-b581-2515856fd938%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

It is almost impossible to use just binary-only Lucene index for migration,
because Elasticsearch needs additional info which is not available in
Lucene. The only method is to reindex data over the Elasticsearch API.

There is a bumpy road but I don't know if one ever tried that:

  • a custom written tool could traverse the segments and extract field
    information and build a rudimentary mapping (without analyzer, without info
    about _all and _source and all Elasticsearch add-ons)

  • another tool could try to reconstruct docs (like the tool Luke) and write
    them to a file in bulk format. Not having the source of the docs means it
    must be possible to retrieve the original input from the Lucene index
    (which is almost never the case)

  • the result could be re-indexed using the Elasticsearch API (assuming all
    analyzers and tokenizers are in place) but a lot of work would have to be
    done

The preferred way is to rewrite the code that uses the Lucene API to use
the Elasticsearch API and re-run the indexing process.

Jörg

On Thu, Nov 13, 2014 at 7:11 PM, Gaurav gupta gupta.gaurav0125@gmail.com
wrote:

Hi All,

I have an embedded Search Engine in our product which is based on Lucene
4.8.1 and now I would like to migrate it to latest Elasticsearch 1.4 for
better distributed support (sharding and replication, mainly). Could you
guide me how one should migrate the existing indexes created by Lucene to
ES.

I have referred to the mail thread - migrate lucene index into
elasticsearch
https://groups.google.com/forum/#!searchin/elasticsearch/migrating/elasticsearch/xCE7124eAL8/ZFluLXqO_IcJ.
And based on the discussion in it appears to me that it's not a easy job
or even not feasible. I am wondering if there is some plugin (river) or
tool or any work around available to migrate the existing indexes created
by Lucene to ES.

I googled that an ES plugin available for SOLR to ES migration :
Trifork Blog - Keep updated on the technical solutions Trifork is working on! .
Do we have someting similar for Lucene to ES migration.

Thanks
Gaurav

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/71c0ed2e-94d7-4b70-b581-2515856fd938%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/71c0ed2e-94d7-4b70-b581-2515856fd938%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE8%3D-6Ft0%3DQBW_%2BShF69WAVzz_Ti%3DtJZMogp%3DQjxF5suA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks Jorg for the guidance and I have am trying the suggested approach #1
and I have further question on it.

As you mentioned - "- a custom written tool could traverse the segments
and extract field information and build a rudimentary mapping (without
analyzer, without info about _all and _source and all Elasticsearch
add-ons)".

We already have a Lucene Index metadata (i.e. field names, type, analyzer
etc.) available as an xml, so I can create the mapping without traversing
the segments. Should I create segment file "segments.gen" using the mapping
file and using some dummy values and then put all the other old lucene
index files ( except "segments.gen" ) from existing lucene index files
(e.g. - segments_2,_0.cfe,_0.cfs,_0.si,_1.cfe,_1.cfs etc.)

sample mapping xml file :-


true
Standard
AddressLine1
AddressLine1
true
string


true
Standard
Building_Name
Building_Name
true
string


true
Keyword
GNAF_PID
GNAF_PID
true
string

...

Thanks

On Thu, Nov 13, 2014 at 11:59 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

It is almost impossible to use just binary-only Lucene index for
migration, because Elasticsearch needs additional info which is not
available in Lucene. The only method is to reindex data over the
Elasticsearch API.

There is a bumpy road but I don't know if one ever tried that:

  • a custom written tool could traverse the segments and extract field
    information and build a rudimentary mapping (without analyzer, without info
    about _all and _source and all Elasticsearch add-ons)

  • another tool could try to reconstruct docs (like the tool Luke) and
    write them to a file in bulk format. Not having the source of the docs
    means it must be possible to retrieve the original input from the Lucene
    index (which is almost never the case)

  • the result could be re-indexed using the Elasticsearch API (assuming all
    analyzers and tokenizers are in place) but a lot of work would have to be
    done

The preferred way is to rewrite the code that uses the Lucene API to use
the Elasticsearch API and re-run the indexing process.

Jörg

On Thu, Nov 13, 2014 at 7:11 PM, Gaurav gupta gupta.gaurav0125@gmail.com
wrote:

Hi All,

I have an embedded Search Engine in our product which is based on Lucene
4.8.1 and now I would like to migrate it to latest Elasticsearch 1.4 for
better distributed support (sharding and replication, mainly). Could you
guide me how one should migrate the existing indexes created by Lucene to
ES.

I have referred to the mail thread - migrate lucene index into
elasticsearch
https://groups.google.com/forum/#!searchin/elasticsearch/migrating/elasticsearch/xCE7124eAL8/ZFluLXqO_IcJ.
And based on the discussion in it appears to me that it's not a easy job
or even not feasible. I am wondering if there is some plugin (river) or
tool or any work around available to migrate the existing indexes
created by Lucene to ES.

I googled that an ES plugin available for SOLR to ES migration :
Trifork Blog - Keep updated on the technical solutions Trifork is working on! .
Do we have someting similar for Lucene to ES migration.

Thanks
Gaurav

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/71c0ed2e-94d7-4b70-b581-2515856fd938%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/71c0ed2e-94d7-4b70-b581-2515856fd938%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE8%3D-6Ft0%3DQBW_%2BShF69WAVzz_Ti%3DtJZMogp%3DQjxF5suA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE8%3D-6Ft0%3DQBW_%2BShF69WAVzz_Ti%3DtJZMogp%3DQjxF5suA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALZAj3KDmA7NVZV2LcG2bcZpdOt%2Bz8%3D_2yuBw1PH1Z0odxz1kA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

You didn't say why you can't just reindex data from original source, but
that would be the cleanest way and likely the fastest in terms of human
time (and $) you'll likely spend if you try using a "shortcut".

Otis

Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

On Thursday, November 13, 2014 1:11:08 PM UTC-5, Gaurav gupta wrote:

Hi All,

I have an embedded Search Engine in our product which is based on Lucene
4.8.1 and now I would like to migrate it to latest Elasticsearch 1.4 for
better distributed support (sharding and replication, mainly). Could you
guide me how one should migrate the existing indexes created by Lucene to
ES.

I have referred to the mail thread - migrate lucene index into
elasticsearch
https://groups.google.com/forum/#!searchin/elasticsearch/migrating/elasticsearch/xCE7124eAL8/ZFluLXqO_IcJ.
And based on the discussion in it appears to me that it's not a easy job
or even not feasible. I am wondering if there is some plugin (river) or
tool or any work around available to migrate the existing indexes created
by Lucene to ES.

I googled that an ES plugin available for SOLR to ES migration :
Trifork Blog - Keep updated on the technical solutions Trifork is working on! .
Do we have someting similar for Lucene to ES migration.

Thanks
Gaurav

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0079a7d4-c71b-4863-9dc5-29d850b04c0e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I can not tell if it will work, but if you could translate your xml mapping
into an Elasticsearch mapping it would be great.

The next steps would be to create an empty index with the mapping, using 1
shard and no replica, _source and _all disabled. Then you could index one
test doc over the ES API. After this, you can find out in the data folder
where ES created the segments files. By exchanging them with a copy of your
Lucene segment files, they should get picked up - or you get nasty errors
because ES uses a custom Lucene index format and can not process standard
Lucene segments.

Jörg

On Thu, Nov 20, 2014 at 2:26 PM, Gaurav gupta gupta.gaurav0125@gmail.com
wrote:

Thanks Jorg for the guidance and I have am trying the suggested approach
#1 and I have further question on it.

As you mentioned - "- a custom written tool could traverse the segments
and extract field information and build a rudimentary mapping (without
analyzer, without info about _all and _source and all Elasticsearch
add-ons)".

We already have a Lucene Index metadata (i.e. field names, type, analyzer
etc.) available as an xml, so I can create the mapping without traversing
the segments. Should I create segment file "segments.gen" using the mapping
file and using some dummy values and then put all the other old lucene
index files ( except "segments.gen" ) from existing lucene index files
(e.g. - segments_2,_0.cfe,_0.cfs,_0.si,_1.cfe,_1.cfs etc.)

sample mapping xml file :-


true
Standard
AddressLine1
AddressLine1
true
string


true
Standard
Building_Name
Building_Name
true
string


true
Keyword
GNAF_PID
GNAF_PID
true
string

...

Thanks

On Thu, Nov 13, 2014 at 11:59 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

It is almost impossible to use just binary-only Lucene index for
migration, because Elasticsearch needs additional info which is not
available in Lucene. The only method is to reindex data over the
Elasticsearch API.

There is a bumpy road but I don't know if one ever tried that:

  • a custom written tool could traverse the segments and extract field
    information and build a rudimentary mapping (without analyzer, without info
    about _all and _source and all Elasticsearch add-ons)

  • another tool could try to reconstruct docs (like the tool Luke) and
    write them to a file in bulk format. Not having the source of the docs
    means it must be possible to retrieve the original input from the Lucene
    index (which is almost never the case)

  • the result could be re-indexed using the Elasticsearch API (assuming
    all analyzers and tokenizers are in place) but a lot of work would have to
    be done

The preferred way is to rewrite the code that uses the Lucene API to use
the Elasticsearch API and re-run the indexing process.

Jörg

On Thu, Nov 13, 2014 at 7:11 PM, Gaurav gupta <gupta.gaurav0125@gmail.com

wrote:

Hi All,

I have an embedded Search Engine in our product which is based on Lucene
4.8.1 and now I would like to migrate it to latest Elasticsearch 1.4 for
better distributed support (sharding and replication, mainly). Could you
guide me how one should migrate the existing indexes created by Lucene to
ES.

I have referred to the mail thread - migrate lucene index into
elasticsearch
https://groups.google.com/forum/#!searchin/elasticsearch/migrating/elasticsearch/xCE7124eAL8/ZFluLXqO_IcJ.
And based on the discussion in it appears to me that it's not a easy job
or even not feasible. I am wondering if there is some plugin (river) or
tool or any work around available to migrate the existing indexes
created by Lucene to ES.

I googled that an ES plugin available for SOLR to ES migration :
Trifork Blog - Keep updated on the technical solutions Trifork is working on! .
Do we have someting similar for Lucene to ES migration.

Thanks
Gaurav

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/71c0ed2e-94d7-4b70-b581-2515856fd938%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/71c0ed2e-94d7-4b70-b581-2515856fd938%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE8%3D-6Ft0%3DQBW_%2BShF69WAVzz_Ti%3DtJZMogp%3DQjxF5suA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE8%3D-6Ft0%3DQBW_%2BShF69WAVzz_Ti%3DtJZMogp%3DQjxF5suA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALZAj3KDmA7NVZV2LcG2bcZpdOt%2Bz8%3D_2yuBw1PH1Z0odxz1kA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALZAj3KDmA7NVZV2LcG2bcZpdOt%2Bz8%3D_2yuBw1PH1Z0odxz1kA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEDi8RP1m_8xKy0pkucMvCkVQgaKXNRq8SvTqH4J03Vdw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Otis,
I am not sure how many of our customers will accept to re-index the whole
data as they are using it since long, although I am trying to convince my
Senior Product Management to keep both Lucene and ES. Some old customers
can think to migrate to ES if they need better real-time performance
through distributed ES.

Note :- Currently, the major reason to migrate to ES from Lucene is to have
better distributed support for faster real-time search. I have an embedded
Search Engine in our product which is based on Lucene 4.8.1 and now I would
like to migrate it to latest Elasticsearch 1.4 for better distributed
support (sharding and replication, mainly).

Thanks
Gaurav

On Sun, Nov 23, 2014 at 4:11 AM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

I can not tell if it will work, but if you could translate your xml
mapping into an Elasticsearch mapping it would be great.

The next steps would be to create an empty index with the mapping, using 1
shard and no replica, _source and _all disabled. Then you could index one
test doc over the ES API. After this, you can find out in the data folder
where ES created the segments files. By exchanging them with a copy of your
Lucene segment files, they should get picked up - or you get nasty errors
because ES uses a custom Lucene index format and can not process standard
Lucene segments.

Jörg

On Thu, Nov 20, 2014 at 2:26 PM, Gaurav gupta gupta.gaurav0125@gmail.com
wrote:

Thanks Jorg for the guidance and I have am trying the suggested approach
#1 and I have further question on it.

As you mentioned - "- a custom written tool could traverse the segments
and extract field information and build a rudimentary mapping (without
analyzer, without info about _all and _source and all Elasticsearch
add-ons)".

We already have a Lucene Index metadata (i.e. field names, type, analyzer
etc.) available as an xml, so I can create the mapping without traversing
the segments. Should I create segment file "segments.gen" using the mapping
file and using some dummy values and then put all the other old lucene
index files ( except "segments.gen" ) from existing lucene index files
(e.g. - segments_2,_0.cfe,_0.cfs,_0.si,_1.cfe,_1.cfs etc.)

sample mapping xml file :-


true
Standard
AddressLine1
AddressLine1
true
string


true
Standard
Building_Name
Building_Name
true
string


true
Keyword
GNAF_PID
GNAF_PID
true
string

...

Thanks

On Thu, Nov 13, 2014 at 11:59 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

It is almost impossible to use just binary-only Lucene index for
migration, because Elasticsearch needs additional info which is not
available in Lucene. The only method is to reindex data over the
Elasticsearch API.

There is a bumpy road but I don't know if one ever tried that:

  • a custom written tool could traverse the segments and extract field
    information and build a rudimentary mapping (without analyzer, without info
    about _all and _source and all Elasticsearch add-ons)

  • another tool could try to reconstruct docs (like the tool Luke) and
    write them to a file in bulk format. Not having the source of the docs
    means it must be possible to retrieve the original input from the Lucene
    index (which is almost never the case)

  • the result could be re-indexed using the Elasticsearch API (assuming
    all analyzers and tokenizers are in place) but a lot of work would have to
    be done

The preferred way is to rewrite the code that uses the Lucene API to use
the Elasticsearch API and re-run the indexing process.

Jörg

On Thu, Nov 13, 2014 at 7:11 PM, Gaurav gupta <
gupta.gaurav0125@gmail.com> wrote:

Hi All,

I have an embedded Search Engine in our product which is based on
Lucene 4.8.1 and now I would like to migrate it to latest Elasticsearch 1.4
for better distributed support (sharding and replication, mainly). Could
you guide me how one should migrate the existing indexes created by Lucene
to ES.

I have referred to the mail thread - migrate lucene index into
elasticsearch
https://groups.google.com/forum/#!searchin/elasticsearch/migrating/elasticsearch/xCE7124eAL8/ZFluLXqO_IcJ.
And based on the discussion in it appears to me that it's not a easy job
or even not feasible. I am wondering if there is some plugin (river) or
tool or any work around available to migrate the existing indexes
created by Lucene to ES.

I googled that an ES plugin available for SOLR to ES migration :
Trifork Blog - Keep updated on the technical solutions Trifork is working on! .
Do we have someting similar for Lucene to ES migration.

Thanks
Gaurav

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/71c0ed2e-94d7-4b70-b581-2515856fd938%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/71c0ed2e-94d7-4b70-b581-2515856fd938%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE8%3D-6Ft0%3DQBW_%2BShF69WAVzz_Ti%3DtJZMogp%3DQjxF5suA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE8%3D-6Ft0%3DQBW_%2BShF69WAVzz_Ti%3DtJZMogp%3DQjxF5suA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALZAj3KDmA7NVZV2LcG2bcZpdOt%2Bz8%3D_2yuBw1PH1Z0odxz1kA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALZAj3KDmA7NVZV2LcG2bcZpdOt%2Bz8%3D_2yuBw1PH1Z0odxz1kA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEDi8RP1m_8xKy0pkucMvCkVQgaKXNRq8SvTqH4J03Vdw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEDi8RP1m_8xKy0pkucMvCkVQgaKXNRq8SvTqH4J03Vdw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALZAj3%2BeyeFfgydMDdRLQCN%2Bq47Xwj-g-xN-y%2BGEkRSdvh795A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks Jorg but I didn't able to migrate the lucene indexes to ES even
after trying what you have suggested. Maybe, I need to follow some more
steps.
I am not getting any error but the search is not showing any docs/records.
While comparing the files, I found that "segments.gen" are identical but
the segments_N (segments_2 in lucene and segments_3 in ES) are slightly
different

[image: Inline image 1]

Lucene Vs ES :-
[image: Inline image 2]

Thanks
Gaurav

On Thu, Nov 27, 2014 at 8:09 PM, Gaurav gupta gupta.gaurav0125@gmail.com
wrote:

Otis,
I am not sure how many of our customers will accept to re-index the whole
data as they are using it since long, although I am trying to convince my
Senior Product Management to keep both Lucene and ES. Some old customers
can think to migrate to ES if they need better real-time performance
through distributed ES.

Note :- Currently, the major reason to migrate to ES from Lucene is to
have better distributed support for faster real-time search. I have an
embedded Search Engine in our product which is based on Lucene 4.8.1 and
now I would like to migrate it to latest Elasticsearch 1.4 for better
distributed support (sharding and replication, mainly).

Thanks
Gaurav

On Sun, Nov 23, 2014 at 4:11 AM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

I can not tell if it will work, but if you could translate your xml
mapping into an Elasticsearch mapping it would be great.

The next steps would be to create an empty index with the mapping, using
1 shard and no replica, _source and _all disabled. Then you could index one
test doc over the ES API. After this, you can find out in the data folder
where ES created the segments files. By exchanging them with a copy of your
Lucene segment files, they should get picked up - or you get nasty errors
because ES uses a custom Lucene index format and can not process standard
Lucene segments.

Jörg

On Thu, Nov 20, 2014 at 2:26 PM, Gaurav gupta <gupta.gaurav0125@gmail.com

wrote:

Thanks Jorg for the guidance and I have am trying the suggested approach
#1 and I have further question on it.

As you mentioned - "- a custom written tool could traverse the
segments and extract field information and build a rudimentary mapping
(without analyzer, without info about _all and _source and all
Elasticsearch add-ons)".

We already have a Lucene Index metadata (i.e. field names, type,
analyzer etc.) available as an xml, so I can create the mapping
without traversing the segments. Should I create segment file
"segments.gen" using the mapping file and using some dummy values and then
put all the other old lucene index files ( except "segments.gen" ) from
existing lucene index files (e.g. - segments_2,_0.cfe,_0.cfs,_0.si,_1.cfe,_1.cfs
etc.)

sample mapping xml file :-


true
Standard
AddressLine1
AddressLine1
true
string


true
Standard
Building_Name
Building_Name
true
string


true
Keyword
GNAF_PID
GNAF_PID
true
string

...

Thanks

On Thu, Nov 13, 2014 at 11:59 PM, joergprante@gmail.com <
joergprante@gmail.com> wrote:

It is almost impossible to use just binary-only Lucene index for
migration, because Elasticsearch needs additional info which is not
available in Lucene. The only method is to reindex data over the
Elasticsearch API.

There is a bumpy road but I don't know if one ever tried that:

  • a custom written tool could traverse the segments and extract field
    information and build a rudimentary mapping (without analyzer, without info
    about _all and _source and all Elasticsearch add-ons)

  • another tool could try to reconstruct docs (like the tool Luke) and
    write them to a file in bulk format. Not having the source of the docs
    means it must be possible to retrieve the original input from the Lucene
    index (which is almost never the case)

  • the result could be re-indexed using the Elasticsearch API (assuming
    all analyzers and tokenizers are in place) but a lot of work would have to
    be done

The preferred way is to rewrite the code that uses the Lucene API to
use the Elasticsearch API and re-run the indexing process.

Jörg

On Thu, Nov 13, 2014 at 7:11 PM, Gaurav gupta <
gupta.gaurav0125@gmail.com> wrote:

Hi All,

I have an embedded Search Engine in our product which is based on
Lucene 4.8.1 and now I would like to migrate it to latest Elasticsearch 1.4
for better distributed support (sharding and replication, mainly). Could
you guide me how one should migrate the existing indexes created by Lucene
to ES.

I have referred to the mail thread - migrate lucene index into
elasticsearch
https://groups.google.com/forum/#!searchin/elasticsearch/migrating/elasticsearch/xCE7124eAL8/ZFluLXqO_IcJ.
And based on the discussion in it appears to me that it's not a easy job
or even not feasible. I am wondering if there is some plugin (river) or
tool or any work around available to migrate the existing indexes
created by Lucene to ES.

I googled that an ES plugin available for SOLR to ES migration :
Trifork Blog - Keep updated on the technical solutions Trifork is working on! .
Do we have someting similar for Lucene to ES migration.

Thanks
Gaurav

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/71c0ed2e-94d7-4b70-b581-2515856fd938%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/71c0ed2e-94d7-4b70-b581-2515856fd938%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE8%3D-6Ft0%3DQBW_%2BShF69WAVzz_Ti%3DtJZMogp%3DQjxF5suA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE8%3D-6Ft0%3DQBW_%2BShF69WAVzz_Ti%3DtJZMogp%3DQjxF5suA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CALZAj3KDmA7NVZV2LcG2bcZpdOt%2Bz8%3D_2yuBw1PH1Z0odxz1kA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CALZAj3KDmA7NVZV2LcG2bcZpdOt%2Bz8%3D_2yuBw1PH1Z0odxz1kA%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEDi8RP1m_8xKy0pkucMvCkVQgaKXNRq8SvTqH4J03Vdw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEDi8RP1m_8xKy0pkucMvCkVQgaKXNRq8SvTqH4J03Vdw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALZAj3KGqzQ2v%2BjDWoZuWhOu7GFuE-nqt%3DQcfr074qvjQSNynA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.