I have a deployment setup with 2 nodes and a storage of 70GB. I need to index 20 million records. Uploading just these documents is taking me a little over 4 hrs. Is there anyway to speed up the process?
I do not have any complicated mappings. But, text type fields are also categorized as "keyword" field. During the indexing process no other process like reading or writing other indexing happens on this deployment.
I have tried changing the batch size but that doesn't have much impact. Replica shards are also removed as they are not really needed.
You need to provide more information about it, like how it is really writing, which code is being used, the configuration etc, I'm not familiar with this connector, but maybe someone else can provide more insight.
While the disk type can impact in the performance, 20 million and 70 GB is not that much, it should not take 4 hours.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.