Hi Dmitry,
You should the use the scan search
type: Elasticsearch Platform — Find real-time answers at scale | Elastic
In pyes I believe the option is scan=True. Here is a snippet I wrote a
while ago. Perhaps with an older version than the one you use.
result_set = es_client.search(q,indices="index",scan=True,size=batch_size)
PATCH pyes for a scanning bug
result_set._max_item = None
result_set is now an interable where you can read documents from. pyes will
make more calls to elasticsearch when needed.
Cheers,
Boaz
On Tuesday, June 11, 2013 8:43:46 AM UTC+2, Dmitry Babitsky wrote:
Hi Doug,
I tried your approach, but did not get any time improvement.
After some debugging I found out that the *bulk=True *flag in my index
command has no effect.
The code that I used is:
search_obj = pyes.query.Search(query = pyes.query.MatchAllQuery(), start=
resume_from)
old_index_iterator = self.esconn.search(search_obj, self.index_name)
counter = 0
BULK_SIZE = 2000
for doc in old_index_iterator:
self.esconn.index(doc=doc, doc_type=DOC_TYPE, index=new_index_name,
id=doc.get_id(),* bulk=True*)
counter += 1
if counter % BULK_SIZE == 0:
self.logger.debug("Refreshing...")
self.esconn.refresh()
Could you please let me know if you use any other pyes API for bulk
inserts?
Thanks!!!
On Sunday, June 9, 2013 11:13:27 AM UTC+3, doug livesey wrote:
It could be worth looking at the bulk operations -- we rebuild an
admittedly much smaller index by using the bulk API & loading 2000
documents in each operation.
On 9 June 2013 09:03, Dmitry Babitsky dim...@gmail.com wrote:
I have an Elasticsearch index with around 200M documents, total index
size of 90Gb.
I changed mapping, so I would like Elasticsearch to re-index all the
documents.
I wrote a script that creates a new index (with the new mapping), then
goes over all the documents in the old index and puts then into the new one.
It seems to work, but the problem is that it works extremely slowly.
It started with 300 documents / minute two days ago, and now the speed
is 150 documents/minute.
The script runs on a machine within the same network the Elasticsearch
machines in.
With such speed it will require a month for the re-index to finish.
Does anybody know about some faster technique to re-index an elastic
search index?
Thanks in advance!!!
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.