It's possible that the number of columns here is causing your write throughput to skew slower. Each record takes up more space in the request than a smaller one. When introducing more shards, bulk requests need to be split into more shard level bulk requests, each containing fewer documents than before. While the number of documents is still low and should generally be processed quickly, the ultimate amount of time to write those documents may be higher because of their size. Ultimately, you might just be running up on the overhead of transmitting and writing the larger documents. Do you know which of the batch size limits that the connector is triggering flushing on more often? If it is the counts value, perhaps increase that number a bit?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.