We're using the es-hadoop connector to dump data from an ES index to spark (w/ pyspark, newAPIHadoopRDD). The issue we're seeing is that the RDD that is returned is skewed in terms of partitions - it always returns 9 partitions, with only 3 of them being populated.
Is there any way to have newAPIHadoopRDD return more evenly distributed partitions from the ES data? I realize that I can repartition them after the fact, but the initial set of partitions is causing some performance headaches.