Is reindex API works in parallel by shard?

ebuildy · December 6, 2016, 9:02am

I use Hadoop plugin a lot, it's quite fast because it takes advantage of shards to work in parallel.

Is reindex API do the same? (or can)

danielmitterdorfer · December 6, 2016, 1:13pm

I am not sure whether I've misunderstood you. The reindex API basically uses sliced scrolls to retrieve documents from the source index and uses the bulk API to put the documents into the destination index. The number of shards is determined by the index settings and not really related to the reindex API.

Daniel

ebuildy · December 6, 2016, 1:30pm

I am wondering if reindex API works like es4Hadoop does:

scrolling in parrallel from each shard ====> _bulk

Let's say you have 3 data nodes, you want to reindex an index with 3 shards, that mean, 1 data node could copy 1 shard, you see?

danielmitterdorfer · December 6, 2016, 1:44pm

Hi @ebuildy,

ok, got you! The reindex action is coordinated by one node in the cluster. Processing of the sliced scrolls is done in parallel however.

Daniel

nik9000 · December 6, 2016, 6:13pm

In 5.1 where is it implemented at all. Before that reindex was always a single process. That isn't to say it is a single thread, just that it isn't forked. The search and bulk stages ran on as many threads as they usually do in Elasticsearch.

danielmitterdorfer · December 7, 2016, 7:02am

Thanks for the additional detail on my answer @nik9000. I was looking at the implementation on the master branch indeed.

Daniel

system · January 4, 2017, 7:02am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Why doesn't the Reindex API parallelize by shard automatically? Elasticsearch	7	886	July 5, 2017
Reindex API performance Elasticsearch	3	4494	July 5, 2017
Does the new Reindex API always reindex to a single shard? Elasticsearch	4	886	July 5, 2017
REINDEX API - Node choice when using TASK API Elasticsearch	5	1263	July 21, 2017
Reindex API not distributing load when using slicing Elasticsearch reindex	4	447	April 13, 2023

Is reindex API works in parallel by shard?

Related topics