Reindex parent-child documents using elasticsearch.helpers paralel_bulk

(Moshe Sucaz) #1

I wrote my own script to re-index our docs from elastic 1.5.2 to 2.3.4 using the elasticsearch.helpers
My question is: Will paralel_bulk re-index children documents with “parent=” ? Because when debugging it, I didn't see “parent” in the returned doc.
My script is:

def parallel_reindex(index_name, doc_type, chunk_size=500, scroll='10m', scan_kwargs={}, bulk_kwargs={}):
target_client = Elasticsearch(hosts=['target_host:9200'], retry_on_timeout=True, max_retries=10, timeout=1000)
source_client = Elasticsearch(hosts=['source_host:9200'], retry_on_timeout=True, max_retries=10, timeout=1000)
query = {"query": {"match_all": {}}}
docs = scan(source_client,
query = query,
index = index_name,
scroll = scroll,
** scan_kwargs
def change_doc_params_to_elastic_2(hits):
for h in hits:
# change field with “.” to “

if 'x.y.z' in h['_source']:
h['_source']['x_y_z'] = h['_source']['x.y.z']
del h['_source']['x.y.z']
# removing _analyzer
if '_analyzer' in h['_source']:
del h['_source']['_analyzer']
if 'fields' in h:
yield h
kwargs = {
'stats_only': True,
for response in parallel_bulk(target_client, _change_doc_params_to_elastic_2(docs), thread_count=8, chunk_size=chunk_size):
#"responce: ", response)

pool = ThreadPool(3)
for doc_type in ["a",”b”,”c”]: #b and c are children's of a
pool.add_task(parallel_reindex, index_name, doc_type)"waiting for all child docs to finish\n")

(Mark Harwood) #2

In 1.x the _parent field isn't included by default in responses.
You need to ask for it explicitly in your requests.

(Moshe Sucaz) #3

Thanks Mark!
I will do that...

(system) #4