Indexing can be both CPU and dusk I/O intensive. I do not know what monitoring you have access to but it would be good to try to identify if CPU or dusk I/O is limiting performance. If you are using gp2 EBS it gets IOPS proportional to size and since large scroll queries results in a lot of disk I/O so that is probably what I would start with.
You may also look at making indexing more efficient, e.g. by increasing the refresh interval of the index if you have not already done so. You might even disable it during the bulk load and just enable it afterwards.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.