Migrate lucene index into elasticsearch


(Matthias Kricke) #1

Hi,

I want, for some reason, to migrate my lucene index into an existing
elasticsearch cluster. Is there a way to do so?

Greetings and thanks for your responses,
MK

--


(Rafał Kuć) #2

Hello!

In addition to Lucene files there are also additional information needed by ElasticSearch. So this won't be simple if possible at all.

--

Regards,

Rafał Kuć

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

Hi,

I want, for some reason, to migrate my lucene index into an existing elasticsearch cluster. Is there a way to do so?

Greetings and thanks for your responses,

MK

--


(Otis Gospodnetić) #3

Hi,

That said, Matthias could write a simple Java app that reads documents from
a Lucene index - assuming all fields were stored and not just indexed - and
index them into a newly set up ES cluster.

Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.html

On Tuesday, September 4, 2012 9:46:57 AM UTC-4, Rafał Kuć wrote:

Hello!

In addition to Lucene files there are also additional information needed
by ElasticSearch. So this won't be simple if possible at all.

*--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
ElasticSearch

Hi,

I want, for some reason, to migrate my lucene index into an existing
elasticsearch cluster. Is there a way to do so?

Greetings and thanks for your responses,
MK

--


(Matthias Kricke) #4

I need performance. My tests show that inserting into elastic search is six
times slower than inserting into lucene.
therefor I have an approach where i insert document into lucene and try to
use those indices in elastic search.

next problem seems to be an blocking operation which steals the performance
of multi threaded inserting.

Would be glad to hear performance oriented answers.

Greetings,
MK

Am Mittwoch, 5. September 2012 01:46:28 UTC+2 schrieb Otis Gospodnetic:

Hi,

That said, Matthias could write a simple Java app that reads documents
from a Lucene index - assuming all fields were stored and not just indexed

  • and index them into a newly set up ES cluster.

Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.html

On Tuesday, September 4, 2012 9:46:57 AM UTC-4, Rafał Kuć wrote:

Hello!

In addition to Lucene files there are also additional information needed
by ElasticSearch. So this won't be simple if possible at all.

*--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
ElasticSearch

Hi,

I want, for some reason, to migrate my lucene index into an existing
elasticsearch cluster. Is there a way to do so?

Greetings and thanks for your responses,
MK

--


(Otis Gospodnetić) #5

Hi Matthias,

ES will always be slower than Lucene because it has more stuff happening on
top of Lucene. Whether it's 6x or 1.1x times depends on what exactly one
is indexing/searching, how are things tuned, etc.

Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.html

On Wednesday, September 5, 2012 9:45:45 AM UTC-4, Matthias Kricke wrote:

I need performance. My tests show that inserting into elastic search is
six times slower than inserting into lucene.
therefor I have an approach where i insert document into lucene and try to
use those indices in elastic search.

next problem seems to be an blocking operation which steals the
performance of multi threaded inserting.

Would be glad to hear performance oriented answers.

Greetings,
MK

Am Mittwoch, 5. September 2012 01:46:28 UTC+2 schrieb Otis Gospodnetic:

Hi,

That said, Matthias could write a simple Java app that reads documents
from a Lucene index - assuming all fields were stored and not just indexed

  • and index them into a newly set up ES cluster.

Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.html

On Tuesday, September 4, 2012 9:46:57 AM UTC-4, Rafał Kuć wrote:

Hello!

In addition to Lucene files there are also additional information needed
by ElasticSearch. So this won't be simple if possible at all.

*--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
ElasticSearch

Hi,

I want, for some reason, to migrate my lucene index into an existing
elasticsearch cluster. Is there a way to do so?

Greetings and thanks for your responses,
MK

--


(Lukáš Vlček) #6

Hi Matthias,

as Otis said, setup on the ES side can be optimized for faster indexing
(default settings work quite well but it can be updated for a short period
of time for improved indexing speed). For example, you can disable
replicas, disable refresh and use bulk request. After you are done with
indexing you can increase number of replicas and allow refresh and possibly
optimize index for best search performance.

For example, check "Bulk Indexing Usage" here for some tips
http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html

Just saying that indexing in ES will be always slower compared to pure
Lucene might be too general (up to incorrect). For example by default ES
divides index into 5 shards, which means 5 independent Lucene indices which
it indexes into in parallel. This could be faster compared to indexing into
single Lucene index, no?

Regards,
Lukas

On Fri, Sep 7, 2012 at 1:59 AM, Otis Gospodnetic <otis.gospodnetic@gmail.com

wrote:

Hi Matthias,

ES will always be slower than Lucene because it has more stuff happening
on top of Lucene. Whether it's 6x or 1.1x times depends on what exactly
one is indexing/searching, how are things tuned, etc.

Otis

Search Analytics - http://sematext.com/search-**analytics/index.htmlhttp://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.**htmlhttp://sematext.com/spm/index.html

On Wednesday, September 5, 2012 9:45:45 AM UTC-4, Matthias Kricke wrote:

I need performance. My tests show that inserting into elastic search is
six times slower than inserting into lucene.
therefor I have an approach where i insert document into lucene and try
to use those indices in elastic search.

next problem seems to be an blocking operation which steals the
performance of multi threaded inserting.

Would be glad to hear performance oriented answers.

Greetings,
MK

Am Mittwoch, 5. September 2012 01:46:28 UTC+2 schrieb Otis Gospodnetic:

Hi,

That said, Matthias could write a simple Java app that reads documents
from a Lucene index - assuming all fields were stored and not just indexed

  • and index them into a newly set up ES cluster.

Otis

Search Analytics - http://sematext.com/search-**analytics/index.htmlhttp://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.**htmlhttp://sematext.com/spm/index.html

On Tuesday, September 4, 2012 9:46:57 AM UTC-4, Rafał Kuć wrote:

Hello!

In addition to Lucene files there are also additional information
needed by ElasticSearch. So this won't be simple if possible at all.

*--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
ElasticSearch

Hi,

I want, for some reason, to migrate my lucene index into an existing
elasticsearch cluster. Is there a way to do so?

Greetings and thanks for your responses,
MK

--

--


(Lukáš Vlček) #7

May be one more note, not only how you tune ES can influence the indexing
speed. External system factors are important as well, for example how much
memory ES and the indexing job has.

On Fri, Sep 7, 2012 at 9:33 AM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Hi Matthias,

as Otis said, setup on the ES side can be optimized for faster indexing
(default settings work quite well but it can be updated for a short period
of time for improved indexing speed). For example, you can disable
replicas, disable refresh and use bulk request. After you are done with
indexing you can increase number of replicas and allow refresh and possibly
optimize index for best search performance.

For example, check "Bulk Indexing Usage" here for some tips
http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html

Just saying that indexing in ES will be always slower compared to pure
Lucene might be too general (up to incorrect). For example by default ES
divides index into 5 shards, which means 5 independent Lucene indices which
it indexes into in parallel. This could be faster compared to indexing into
single Lucene index, no?

Regards,
Lukas

On Fri, Sep 7, 2012 at 1:59 AM, Otis Gospodnetic <
otis.gospodnetic@gmail.com> wrote:

Hi Matthias,

ES will always be slower than Lucene because it has more stuff happening
on top of Lucene. Whether it's 6x or 1.1x times depends on what exactly
one is indexing/searching, how are things tuned, etc.

Otis

Search Analytics - http://sematext.com/search-**analytics/index.htmlhttp://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.**htmlhttp://sematext.com/spm/index.html

On Wednesday, September 5, 2012 9:45:45 AM UTC-4, Matthias Kricke wrote:

I need performance. My tests show that inserting into elastic search is
six times slower than inserting into lucene.
therefor I have an approach where i insert document into lucene and try
to use those indices in elastic search.

next problem seems to be an blocking operation which steals the
performance of multi threaded inserting.

Would be glad to hear performance oriented answers.

Greetings,
MK

Am Mittwoch, 5. September 2012 01:46:28 UTC+2 schrieb Otis Gospodnetic:

Hi,

That said, Matthias could write a simple Java app that reads documents
from a Lucene index - assuming all fields were stored and not just indexed

  • and index them into a newly set up ES cluster.

Otis

Search Analytics - http://sematext.com/search-**analytics/index.htmlhttp://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.**htmlhttp://sematext.com/spm/index.html

On Tuesday, September 4, 2012 9:46:57 AM UTC-4, Rafał Kuć wrote:

Hello!

In addition to Lucene files there are also additional information
needed by ElasticSearch. So this won't be simple if possible at all.

*--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
ElasticSearch

Hi,

I want, for some reason, to migrate my lucene index into an existing
elasticsearch cluster. Is there a way to do so?

Greetings and thanks for your responses,
MK

--

--


(David Pilato) #8

Agree.

I think Otis was comparing Lucene with ES on one node with 1 shard. In that case, Lucene will be always faster.

ES bring so much features (horizontal scaling, replicas, sharding, ...) that we could forget the overhead cost!

My 2 cents

:wink:

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 7 sept. 2012 à 09:33, Lukáš Vlček lukas.vlcek@gmail.com a écrit :

Hi Matthias,

as Otis said, setup on the ES side can be optimized for faster indexing (default settings work quite well but it can be updated for a short period of time for improved indexing speed). For example, you can disable replicas, disable refresh and use bulk request. After you are done with indexing you can increase number of replicas and allow refresh and possibly optimize index for best search performance.

For example, check "Bulk Indexing Usage" here for some tips http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html

Just saying that indexing in ES will be always slower compared to pure Lucene might be too general (up to incorrect). For example by default ES divides index into 5 shards, which means 5 independent Lucene indices which it indexes into in parallel. This could be faster compared to indexing into single Lucene index, no?

Regards,
Lukas

On Fri, Sep 7, 2012 at 1:59 AM, Otis Gospodnetic otis.gospodnetic@gmail.com wrote:
Hi Matthias,

ES will always be slower than Lucene because it has more stuff happening on top of Lucene. Whether it's 6x or 1.1x times depends on what exactly one is indexing/searching, how are things tuned, etc.

Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.html

On Wednesday, September 5, 2012 9:45:45 AM UTC-4, Matthias Kricke wrote:
I need performance. My tests show that inserting into elastic search is six times slower than inserting into lucene.
therefor I have an approach where i insert document into lucene and try to use those indices in elastic search.

next problem seems to be an blocking operation which steals the performance of multi threaded inserting.

Would be glad to hear performance oriented answers.

Greetings,
MK

Am Mittwoch, 5. September 2012 01:46:28 UTC+2 schrieb Otis Gospodnetic:
Hi,

That said, Matthias could write a simple Java app that reads documents from a Lucene index - assuming all fields were stored and not just indexed - and index them into a newly set up ES cluster.

Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.html

On Tuesday, September 4, 2012 9:46:57 AM UTC-4, Rafał Kuć wrote:
Hello!

In addition to Lucene files there are also additional information needed by ElasticSearch. So this won't be simple if possible at all.

--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

Hi,

I want, for some reason, to migrate my lucene index into an existing elasticsearch cluster. Is there a way to do so?

Greetings and thanks for your responses,
MK

--

--

--


(Matthias Kricke) #9

Thanks for your answers. I will try different approaches to improve the
write performance.

Regards,
MK

Am Freitag, 7. September 2012 09:57:28 UTC+2 schrieb David Pilato:

Agree.

I think Otis was comparing Lucene with ES on one node with 1 shard. In
that case, Lucene will be always faster.

ES bring so much features (horizontal scaling, replicas, sharding, ...)
that we could forget the overhead cost!

My 2 cents

:wink:

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 7 sept. 2012 à 09:33, Lukáš Vlček <lukas...@gmail.com <javascript:>> a
écrit :

Hi Matthias,

as Otis said, setup on the ES side can be optimized for faster indexing
(default settings work quite well but it can be updated for a short period
of time for improved indexing speed). For example, you can disable
replicas, disable refresh and use bulk request. After you are done with
indexing you can increase number of replicas and allow refresh and possibly
optimize index for best search performance.

For example, check "Bulk Indexing Usage" here for some tips
http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html

Just saying that indexing in ES will be always slower compared to pure
Lucene might be too general (up to incorrect). For example by default ES
divides index into 5 shards, which means 5 independent Lucene indices which
it indexes into in parallel. This could be faster compared to indexing into
single Lucene index, no?

Regards,
Lukas

On Fri, Sep 7, 2012 at 1:59 AM, Otis Gospodnetic <otis.gos...@gmail.com<javascript:>

wrote:

Hi Matthias,

ES will always be slower than Lucene because it has more stuff happening
on top of Lucene. Whether it's 6x or 1.1x times depends on what exactly
one is indexing/searching, how are things tuned, etc.

Otis

Search Analytics - http://sematext.com/search-**analytics/index.htmlhttp://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.**htmlhttp://sematext.com/spm/index.html

On Wednesday, September 5, 2012 9:45:45 AM UTC-4, Matthias Kricke wrote:

I need performance. My tests show that inserting into elastic search is
six times slower than inserting into lucene.
therefor I have an approach where i insert document into lucene and try
to use those indices in elastic search.

next problem seems to be an blocking operation which steals the
performance of multi threaded inserting.

Would be glad to hear performance oriented answers.

Greetings,
MK

Am Mittwoch, 5. September 2012 01:46:28 UTC+2 schrieb Otis Gospodnetic:

Hi,

That said, Matthias could write a simple Java app that reads documents
from a Lucene index - assuming all fields were stored and not just indexed

  • and index them into a newly set up ES cluster.

Otis

Search Analytics - http://sematext.com/search-**analytics/index.htmlhttp://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.**htmlhttp://sematext.com/spm/index.html

On Tuesday, September 4, 2012 9:46:57 AM UTC-4, Rafał Kuć wrote:

Hello!

In addition to Lucene files there are also additional information
needed by ElasticSearch. So this won't be simple if possible at all.

*--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
ElasticSearch

Hi,

I want, for some reason, to migrate my lucene index into an existing
elasticsearch cluster. Is there a way to do so?

Greetings and thanks for your responses,
MK

--

--

--


(Otis Gospodnetić) #10

Correct, that is what I meant, of course.

Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.html

On Friday, September 7, 2012 3:57:28 AM UTC-4, David Pilato wrote:

Agree.

I think Otis was comparing Lucene with ES on one node with 1 shard. In
that case, Lucene will be always faster.

ES bring so much features (horizontal scaling, replicas, sharding, ...)
that we could forget the overhead cost!

My 2 cents

:wink:

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 7 sept. 2012 à 09:33, Lukáš Vlček <lukas...@gmail.com <javascript:>> a
écrit :

Hi Matthias,

as Otis said, setup on the ES side can be optimized for faster indexing
(default settings work quite well but it can be updated for a short period
of time for improved indexing speed). For example, you can disable
replicas, disable refresh and use bulk request. After you are done with
indexing you can increase number of replicas and allow refresh and possibly
optimize index for best search performance.

For example, check "Bulk Indexing Usage" here for some tips
http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html

Just saying that indexing in ES will be always slower compared to pure
Lucene might be too general (up to incorrect). For example by default ES
divides index into 5 shards, which means 5 independent Lucene indices which
it indexes into in parallel. This could be faster compared to indexing into
single Lucene index, no?

Regards,
Lukas

On Fri, Sep 7, 2012 at 1:59 AM, Otis Gospodnetic <otis.gos...@gmail.com<javascript:>

wrote:

Hi Matthias,

ES will always be slower than Lucene because it has more stuff happening
on top of Lucene. Whether it's 6x or 1.1x times depends on what exactly
one is indexing/searching, how are things tuned, etc.

Otis

Search Analytics - http://sematext.com/search-**analytics/index.htmlhttp://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.**htmlhttp://sematext.com/spm/index.html

On Wednesday, September 5, 2012 9:45:45 AM UTC-4, Matthias Kricke wrote:

I need performance. My tests show that inserting into elastic search is
six times slower than inserting into lucene.
therefor I have an approach where i insert document into lucene and try
to use those indices in elastic search.

next problem seems to be an blocking operation which steals the
performance of multi threaded inserting.

Would be glad to hear performance oriented answers.

Greetings,
MK

Am Mittwoch, 5. September 2012 01:46:28 UTC+2 schrieb Otis Gospodnetic:

Hi,

That said, Matthias could write a simple Java app that reads documents
from a Lucene index - assuming all fields were stored and not just indexed

  • and index them into a newly set up ES cluster.

Otis

Search Analytics - http://sematext.com/search-**analytics/index.htmlhttp://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - http://sematext.com/spm/index.**htmlhttp://sematext.com/spm/index.html

On Tuesday, September 4, 2012 9:46:57 AM UTC-4, Rafał Kuć wrote:

Hello!

In addition to Lucene files there are also additional information
needed by ElasticSearch. So this won't be simple if possible at all.

*--
Regards,
Rafał Kuć
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch -
ElasticSearch

Hi,

I want, for some reason, to migrate my lucene index into an existing
elasticsearch cluster. Is there a way to do so?

Greetings and thanks for your responses,
MK

--

--

--


(system) #11