How to reindex ElasticSearch quickly?

I have an ElasticSearch index with around 200M documents, total index size
of 90Gb.

I changed mapping, so I would like ElasticSearch to re-index all the
documents.

I wrote a script that creates a new index (with the new mapping), then goes
over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed is
150 documents/minute.

The script runs on a machine within the same network the elastic search
machines in.

With such speed it will require a month for the re-index to finish.

Does anybody know about some faster technique to re-index an elastic search
index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

It could be worth looking at the bulk operations -- we rebuild an
admittedly much smaller index by using the bulk API & loading 2000
documents in each operation.

On 9 June 2013 09:03, Dmitry Babitsky dimok21@gmail.com wrote:

I have an Elasticsearch index with around 200M documents, total index size
of 90Gb.

I changed mapping, so I would like Elasticsearch to re-index all the
documents.

I wrote a script that creates a new index (with the new mapping), then
goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed is
150 documents/minute.

The script runs on a machine within the same network the Elasticsearch
machines in.

With such speed it will require a month for the re-index to finish.

Does anybody know about some faster technique to re-index an elastic
search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks
Vineeth

On Sun, Jun 9, 2013 at 1:43 PM, doug livesey biot023@gmail.com wrote:

It could be worth looking at the bulk operations -- we rebuild an
admittedly much smaller index by using the bulk API & loading 2000
documents in each operation.

On 9 June 2013 09:03, Dmitry Babitsky dimok21@gmail.com wrote:

I have an Elasticsearch index with around 200M documents, total index
size of 90Gb.

I changed mapping, so I would like Elasticsearch to re-index all the
documents.

I wrote a script that creates a new index (with the new mapping), then
goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed is
150 documents/minute.

The script runs on a machine within the same network the Elasticsearch
machines in.

With such speed it will require a month for the re-index to finish.

Does anybody know about some faster technique to re-index an elastic
search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Two questions about the reindex plug-in:

  1. Is it possible to reindex an existing index into a new one, so it would
    run offline?
  2. I could not understand from the reindex-plug-in readme what is the right
    way to run it, so it will reindex the entire index, without any counters...

Thanks,
Dmitry.

On Sunday, June 9, 2013 11:51:30 AM UTC+3, Vineeth Mohan wrote:

GitHub - karussell/elasticsearch-reindex: Simple re-indexing. To backup, apply index settings changes and more ElasticMagic

Thanks
Vineeth

On Sun, Jun 9, 2013 at 1:43 PM, doug livesey <bio...@gmail.com<javascript:>

wrote:

It could be worth looking at the bulk operations -- we rebuild an
admittedly much smaller index by using the bulk API & loading 2000
documents in each operation.

On 9 June 2013 09:03, Dmitry Babitsky <dim...@gmail.com <javascript:>>wrote:

I have an Elasticsearch index with around 200M documents, total index
size of 90Gb.

I changed mapping, so I would like Elasticsearch to re-index all the
documents.

I wrote a script that creates a new index (with the new mapping), then
goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed
is 150 documents/minute.

The script runs on a machine within the same network the Elasticsearch
machines in.

With such speed it will require a month for the re-index to finish.

Does anybody know about some faster technique to re-index an elastic
search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

The idea of bulk indexing sounds very good!
One question - how do you perform the bulk read?

Thanks a lot!!!

On Sunday, June 9, 2013 11:13:27 AM UTC+3, doug livesey wrote:

It could be worth looking at the bulk operations -- we rebuild an
admittedly much smaller index by using the bulk API & loading 2000
documents in each operation.

On 9 June 2013 09:03, Dmitry Babitsky <dim...@gmail.com <javascript:>>wrote:

I have an Elasticsearch index with around 200M documents, total index
size of 90Gb.

I changed mapping, so I would like Elasticsearch to re-index all the
documents.

I wrote a script that creates a new index (with the new mapping), then
goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed is
150 documents/minute.

The script runs on a machine within the same network the Elasticsearch
machines in.

With such speed it will require a month for the re-index to finish.

Does anybody know about some faster technique to re-index an elastic
search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

IIRC, you can query for a bunch of documents, and they'll be returned
(nested in the response) in an array. There must be limit and offset
options to those queries.
Once you have have your array of documents, you can feed that to the bulk
index API.

On 9 June 2013 10:40, Dmitry Babitsky dimok21@gmail.com wrote:

The idea of bulk indexing sounds very good!
One question - how do you perform the bulk read?

Thanks a lot!!!

On Sunday, June 9, 2013 11:13:27 AM UTC+3, doug livesey wrote:

It could be worth looking at the bulk operations -- we rebuild an
admittedly much smaller index by using the bulk API & loading 2000
documents in each operation.

On 9 June 2013 09:03, Dmitry Babitsky dim...@gmail.com wrote:

I have an Elasticsearch index with around 200M documents, total index
size of 90Gb.

I changed mapping, so I would like Elasticsearch to re-index all the
documents.

I wrote a script that creates a new index (with the new mapping), then
goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed
is 150 documents/minute.

The script runs on a machine within the same network the Elasticsearch
machines in.

With such speed it will require a month for the re-index to finish.

Does anybody know about some faster technique to re-index an elastic
search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yes , you can reindex an existing index.
You just need to create the new index and give the command to re index it.
Secondly why it is high speed- It works from within elasticsearch , which
negates all the network latency when the same thing is done from outside.
Also it uses scan to "bulk read" and uses bulk insert to copy. So all high
speed options are used here.

Thanks
Vineeth

On Sun, Jun 9, 2013 at 2:33 PM, Dmitry Babitsky dimok21@gmail.com wrote:

Two questions about the reindex plug-in:

  1. Is it possible to reindex an existing index into a new one, so it would
    run offline?
  2. I could not understand from the reindex-plug-in readme what is the
    right way to run it, so it will reindex the entire index, without any
    counters...

Thanks,
Dmitry.

On Sunday, June 9, 2013 11:51:30 AM UTC+3, Vineeth Mohan wrote:

https://github.com/karussell/**elasticsearch-reindexhttps://github.com/karussell/elasticsearch-reindex

Thanks
Vineeth

On Sun, Jun 9, 2013 at 1:43 PM, doug livesey bio...@gmail.com wrote:

It could be worth looking at the bulk operations -- we rebuild an
admittedly much smaller index by using the bulk API & loading 2000
documents in each operation.

On 9 June 2013 09:03, Dmitry Babitsky dim...@gmail.com wrote:

I have an Elasticsearch index with around 200M documents, total index
size of 90Gb.

I changed mapping, so I would like Elasticsearch to re-index all the
documents.

I wrote a script that creates a new index (with the new mapping), then
goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed
is 150 documents/minute.

The script runs on a machine within the same network the Elasticsearch
machines in.

With such speed it will require a month for the re-index to finish.

Does anybody know about some faster technique to re-index an elastic
search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Don't you have long time delays when you put high offsets?
For example, offset of 800,000 on my DB gives delay of about 30 seconds
from the time I sent the search command till I start receiving the
documents.

On Sunday, June 9, 2013 1:22:58 PM UTC+3, doug livesey wrote:

IIRC, you can query for a bunch of documents, and they'll be returned
(nested in the response) in an array. There must be limit and offset
options to those queries.
Once you have have your array of documents, you can feed that to the bulk
index API.

On 9 June 2013 10:40, Dmitry Babitsky <dim...@gmail.com <javascript:>>wrote:

The idea of bulk indexing sounds very good!
One question - how do you perform the bulk read?

Thanks a lot!!!

On Sunday, June 9, 2013 11:13:27 AM UTC+3, doug livesey wrote:

It could be worth looking at the bulk operations -- we rebuild an
admittedly much smaller index by using the bulk API & loading 2000
documents in each operation.

On 9 June 2013 09:03, Dmitry Babitsky dim...@gmail.com wrote:

I have an Elasticsearch index with around 200M documents, total index
size of 90Gb.

I changed mapping, so I would like Elasticsearch to re-index all the
documents.

I wrote a script that creates a new index (with the new mapping), then
goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed
is 150 documents/minute.

The script runs on a machine within the same network the Elasticsearch
machines in.

With such speed it will require a month for the re-index to finish.

Does anybody know about some faster technique to re-index an elastic
search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Doug,

I tried your approach, but did not get any time improvement.
After some debugging I found out that the *bulk=True *flag in my index
command has no effect.

The code that I used is:
search_obj = pyes.query.Search(query = pyes.query.MatchAllQuery(), start=
resume_from)

old_index_iterator = self.esconn.search(search_obj, self.index_name)
counter = 0
BULK_SIZE = 2000

for doc in old_index_iterator:
self.esconn.index(doc=doc, doc_type=DOC_TYPE, index=new_index_name, id=
doc.get_id(),* bulk=True*)
counter += 1

if counter % BULK_SIZE == 0:
self.logger.debug("Refreshing...")

  • **self.esconn.refresh()**
    
  • self.logger.debug("Refresh done.")
    

self.esconn.refresh()

Could you please let me know if you use any other pyes API for bulk inserts?

Thanks!!!

On Sunday, June 9, 2013 11:13:27 AM UTC+3, doug livesey wrote:

It could be worth looking at the bulk operations -- we rebuild an
admittedly much smaller index by using the bulk API & loading 2000
documents in each operation.

On 9 June 2013 09:03, Dmitry Babitsky <dim...@gmail.com <javascript:>>wrote:

I have an Elasticsearch index with around 200M documents, total index
size of 90Gb.

I changed mapping, so I would like Elasticsearch to re-index all the
documents.

I wrote a script that creates a new index (with the new mapping), then
goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed is
150 documents/minute.

The script runs on a machine within the same network the Elasticsearch
machines in.

With such speed it will require a month for the re-index to finish.

Does anybody know about some faster technique to re-index an elastic
search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Dmitry,

You should the use the scan search
type: Elasticsearch Platform — Find real-time answers at scale | Elastic

In pyes I believe the option is scan=True. Here is a snippet I wrote a
while ago. Perhaps with an older version than the one you use.

result_set = es_client.search(q,indices="index",scan=True,size=batch_size)

PATCH pyes for a scanning bug

result_set._max_item = None

result_set is now an interable where you can read documents from. pyes will
make more calls to elasticsearch when needed.

Cheers,
Boaz

On Tuesday, June 11, 2013 8:43:46 AM UTC+2, Dmitry Babitsky wrote:

Hi Doug,

I tried your approach, but did not get any time improvement.
After some debugging I found out that the *bulk=True *flag in my index
command has no effect.

The code that I used is:
search_obj = pyes.query.Search(query = pyes.query.MatchAllQuery(), start=
resume_from)

old_index_iterator = self.esconn.search(search_obj, self.index_name)
counter = 0
BULK_SIZE = 2000

for doc in old_index_iterator:
self.esconn.index(doc=doc, doc_type=DOC_TYPE, index=new_index_name,
id=doc.get_id(),* bulk=True*)
counter += 1

if counter % BULK_SIZE == 0:
self.logger.debug("Refreshing...")

  • **self.esconn.refresh()**
    
  • self.logger.debug("Refresh done.")
    

self.esconn.refresh()

Could you please let me know if you use any other pyes API for bulk
inserts?

Thanks!!!

On Sunday, June 9, 2013 11:13:27 AM UTC+3, doug livesey wrote:

It could be worth looking at the bulk operations -- we rebuild an
admittedly much smaller index by using the bulk API & loading 2000
documents in each operation.

On 9 June 2013 09:03, Dmitry Babitsky dim...@gmail.com wrote:

I have an Elasticsearch index with around 200M documents, total index
size of 90Gb.

I changed mapping, so I would like Elasticsearch to re-index all the
documents.

I wrote a script that creates a new index (with the new mapping), then
goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed
is 150 documents/minute.

The script runs on a machine within the same network the Elasticsearch
machines in.

With such speed it will require a month for the re-index to finish.

Does anybody know about some faster technique to re-index an elastic
search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Boaz,

Thanks a lot for your answer.
According to my measurements, however, the bottleneck is in index, which
ignores *bulk=True *flag, not in search...

Dmitry.

On Tuesday, June 11, 2013 11:46:10 AM UTC+3, Boaz Leskes wrote:

Hi Dmitry,

You should the use the scan search type:
Elasticsearch Platform — Find real-time answers at scale | Elastic

In pyes I believe the option is scan=True. Here is a snippet I wrote a
while ago. Perhaps with an older version than the one you use.

result_set = es_client.search(q,indices="index",scan=True,size=batch_size)

PATCH pyes for a scanning bug

result_set._max_item = None

result_set is now an interable where you can read documents from. pyes
will make more calls to elasticsearch when needed.

Cheers,
Boaz

On Tuesday, June 11, 2013 8:43:46 AM UTC+2, Dmitry Babitsky wrote:

Hi Doug,

I tried your approach, but did not get any time improvement.
After some debugging I found out that the *bulk=True *flag in my index
command has no effect.

The code that I used is:
search_obj = pyes.query.Search(query = pyes.query.MatchAllQuery(), start
=resume_from)

old_index_iterator = self.esconn.search(search_obj, self.index_name)
counter = 0
BULK_SIZE = 2000

for doc in old_index_iterator:
self.esconn.index(doc=doc, doc_type=DOC_TYPE, index=new_index_name,
id=doc.get_id(),* bulk=True*)
counter += 1

if counter % BULK_SIZE == 0:
self.logger.debug("Refreshing...")

  • **self.esconn.refresh()**
    
  • self.logger.debug("Refresh done.")
    

self.esconn.refresh()

Could you please let me know if you use any other pyes API for bulk
inserts?

Thanks!!!

On Sunday, June 9, 2013 11:13:27 AM UTC+3, doug livesey wrote:

It could be worth looking at the bulk operations -- we rebuild an
admittedly much smaller index by using the bulk API & loading 2000
documents in each operation.

On 9 June 2013 09:03, Dmitry Babitsky dim...@gmail.com wrote:

I have an Elasticsearch index with around 200M documents, total index
size of 90Gb.

I changed mapping, so I would like Elasticsearch to re-index all the
documents.

I wrote a script that creates a new index (with the new mapping), then
goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed
is 150 documents/minute.

The script runs on a machine within the same network the Elasticsearch
machines in.

With such speed it will require a month for the re-index to finish.

Does anybody know about some faster technique to re-index an elastic
search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Dmitry,

I'll have to dive into pyes code to see why it goes wrong, but for speed
you really need to use the bulk api for indexing and the scan search type
together. If pyes is in your way, you can easily construct the request your
self using json.dumps (watch out for unicode data and encoding). More info
here: Elasticsearch Platform — Find real-time answers at scale | Elastic

Cheers,
Boaz

On Tuesday, June 11, 2013 11:10:06 AM UTC+2, Dmitry Babitsky wrote:

Hi Boaz,

Thanks a lot for your answer.
According to my measurements, however, the bottleneck is in index,
which ignores *bulk=True *flag, not in search...

Dmitry.

On Tuesday, June 11, 2013 11:46:10 AM UTC+3, Boaz Leskes wrote:

Hi Dmitry,

You should the use the scan search type:
Elasticsearch Platform — Find real-time answers at scale | Elastic

In pyes I believe the option is scan=True. Here is a snippet I wrote a
while ago. Perhaps with an older version than the one you use.

result_set = es_client.search(q,indices="index",scan=True,size=batch_size
)

PATCH pyes for a scanning bug

result_set._max_item = None

result_set is now an interable where you can read documents from. pyes
will make more calls to elasticsearch when needed.

Cheers,
Boaz

On Tuesday, June 11, 2013 8:43:46 AM UTC+2, Dmitry Babitsky wrote:

Hi Doug,

I tried your approach, but did not get any time improvement.
After some debugging I found out that the *bulk=True *flag in my index
command has no effect.

The code that I used is:
search_obj = pyes.query.Search(query = pyes.query.MatchAllQuery(),
start=resume_from)

old_index_iterator = self.esconn.search(search_obj, self.index_name)
counter = 0
BULK_SIZE = 2000

for doc in old_index_iterator:
self.esconn.index(doc=doc, doc_type=DOC_TYPE, index=new_index_name,
id=doc.get_id(),* bulk=True*)
counter += 1

if counter % BULK_SIZE == 0:
self.logger.debug("Refreshing...")

  • **self.esconn.refresh()**
    
  • self.logger.debug("Refresh done.")
    

self.esconn.refresh()

Could you please let me know if you use any other pyes API for bulk
inserts?

Thanks!!!

On Sunday, June 9, 2013 11:13:27 AM UTC+3, doug livesey wrote:

It could be worth looking at the bulk operations -- we rebuild an
admittedly much smaller index by using the bulk API & loading 2000
documents in each operation.

On 9 June 2013 09:03, Dmitry Babitsky dim...@gmail.com wrote:

I have an Elasticsearch index with around 200M documents, total index
size of 90Gb.

I changed mapping, so I would like Elasticsearch to re-index all the
documents.

I wrote a script that creates a new index (with the new mapping), then
goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed
is 150 documents/minute.

The script runs on a machine within the same network the elastic
search machines in.

With such speed it will require a month for the re-index to finish.

Does anybody know about some faster technique to re-index an elastic
search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Vineeth,

I've installed the _reindex plug-in, and it works very fast indeed.
I only have one small problem - I activate the plugin with curl -XPUT
command from the server Elasticsearch runs from (localhost), and each time
I run it, it hangs up after several hours.
The error message that I see coming back from curl is: curl: (52) Empty
reply from server

ubuntu@elasticsearch-test:~$ date; time curl -XPUT
'http://localhost:9200/my_index_2013_06_19_reindexed/my_type/_reindex?searchIndex=my_index&searchType=my_type&hitsPerPage=2000'
; date
Wed Jun 19 14:22:20 UTC 2013
curl: (52) Empty reply from server

real 257m28.136s
user 0m0.216s
sys 0m0.460s
Wed Jun 19 18:39:48 UTC 2013

The example above re-indexed 4M out of 6M of documents that I have.

Any ideas why it goes wrong here?

Thanks,
Dmitry.

On Sunday, June 9, 2013 11:51:30 AM UTC+3, Vineeth Mohan wrote:

GitHub - karussell/elasticsearch-reindex: Simple re-indexing. To backup, apply index settings changes and more ElasticMagic

Thanks
Vineeth

On Sun, Jun 9, 2013 at 1:43 PM, doug livesey <bio...@gmail.com<javascript:>

wrote:

It could be worth looking at the bulk operations -- we rebuild an
admittedly much smaller index by using the bulk API & loading 2000
documents in each operation.

On 9 June 2013 09:03, Dmitry Babitsky <dim...@gmail.com <javascript:>>wrote:

I have an Elasticsearch index with around 200M documents, total index
size of 90Gb.

I changed mapping, so I would like Elasticsearch to re-index all the
documents.

I wrote a script that creates a new index (with the new mapping), then
goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed
is 150 documents/minute.

The script runs on a machine within the same network the Elasticsearch
machines in.

With such speed it will require a month for the re-index to finish.

Does anybody know about some faster technique to re-index an elastic
search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

One more thing that would probably help is disabling the refresh (set
refresh_interval to -1) to your new index
settingshttp://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html,
and then change the refresh_interval back to whatever suits you (1 is the
default - for auto-refreshing each second).

This also implies that you don't need to do any refresh from your pyes app.
Btw, in pyes, bulks are automatically sent every bulk_size (which you have
to specify when creating the connection object). If you need to flush the
bulk, there's something like conn.flush() (don't remember the name
exactly), which does it. You probably want to add that when your script is
done, although in theory it should flush the bulk on exit.

On Thu, Jun 20, 2013 at 9:47 AM, Dmitry Babitsky dimok21@gmail.com wrote:

Hi Vineeth,

I've installed the _reindex plug-in, and it works very fast indeed.
I only have one small problem - I activate the plugin with curl -XPUT
command from the server Elasticsearch runs from (localhost), and each time
I run it, it hangs up after several hours.
The error message that I see coming back from curl is: curl: (52) Empty
reply from server

ubuntu@elasticsearch-test:~$ date; time curl -XPUT '
http://localhost:9200/my_index_2013_06_19_reindexed/my_type/_reindex?searchIndex=my_index&searchType=my_type&hitsPerPage=2000
'; date
Wed Jun 19 14:22:20 UTC 2013
curl: (52) Empty reply from server

real 257m28.136s
user 0m0.216s
sys 0m0.460s
Wed Jun 19 18:39:48 UTC 2013

The example above re-indexed 4M out of 6M of documents that I have.

Any ideas why it goes wrong here?

Thanks,
Dmitry.

On Sunday, June 9, 2013 11:51:30 AM UTC+3, Vineeth Mohan wrote:

https://github.com/karussell/**elasticsearch-reindexhttps://github.com/karussell/elasticsearch-reindex

Thanks
Vineeth

On Sun, Jun 9, 2013 at 1:43 PM, doug livesey bio...@gmail.com wrote:

It could be worth looking at the bulk operations -- we rebuild an
admittedly much smaller index by using the bulk API & loading 2000
documents in each operation.

On 9 June 2013 09:03, Dmitry Babitsky dim...@gmail.com wrote:

I have an Elasticsearch index with around 200M documents, total index
size of 90Gb.

I changed mapping, so I would like Elasticsearch to re-index all the
documents.

I wrote a script that creates a new index (with the new mapping), then
goes over all the documents in the old index and puts then into the new one.

It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed
is 150 documents/minute.

The script runs on a machine within the same network the Elasticsearch
machines in.

With such speed it will require a month for the re-index to finish.

Does anybody know about some faster technique to re-index an elastic
search index?

Thanks in advance!!!

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@**googlegroups.com.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.