Using multiple indices for ElasticSearch Spark plugin throws a curious warning


I am using ES Spark plugin and reading multiple indices...

sparkContext.esJsonRDD("index1, index2")

gives me

WARN RestRepository: Read resource [index1,index2/] includes multiple indices or/and aliases; to avoid duplicate results (caused by shard overlapping), parallelism is reduced from 2 to 1

It does not have any problem running the application, but I am wondering what exactly does the warning mean?
Could anyone explain about this warning?

(Costin Leau) #2

It is a warning. The reason behind it has to do with how the indices shards are spread across the various nodes and how ES can select data. Currently ES does not allow one to refer to multiple indices in a query yet select only one shard from a given index (and not all of them) - hence why ES-Hadoop behind the scenes tries the various combinations and in case, there isn't one, falls back to reduced parallelism.

(system) #3