Pyes bulk insert problem

first you need to pass bulk_size on connection creation
refresh is not required. do it at the end of indexing process.
before bulk indexing reduce the refresh_interval to -1 in the index settings. restore it at the end.
bulk_index should work

Inviato da iPhone

Il giorno 11/giu/2013, alle ore 08:28, Dmitry Babitsky dimok21@gmail.com ha scritto:

Hi,

I have a 200M documents index I would like to reindex.
I wrote the following script that goes over documents in the old index and puts them with balk insert into the new index.
The size of each bulk is 2000 documents.

search_obj = pyes.query.Search(query = pyes.query.MatchAllQuery(), start=resume_from)

old_index_iterator = self.esconn.search(search_obj, self.index_name)
counter = 0
BULK_SIZE = 2000

for doc in old_index_iterator:
self.esconn.index(doc=doc, doc_type=DOC_TYPE, index=new_index_name, id=doc.get_id(), bulk=True)
counter += 1

if counter % BULK_SIZE == 0:
self.logger.debug("Refreshing...")
self.esconn.refresh()
self.logger.debug("Refresh done.")

self.esconn.refresh()

Observation:

  1. The speed that I get is very slow: around 150 documents / minute.
  2. The time of the refresh operation is 0.
  3. If I remove the index command (just read from the DB) - I speed up 10 times.

Conclusion:
The index ignores the bulk=True flag, and pushes every single document to the ES server.

Anyone know please help me to figure out why bulk=True has no effect?

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.