Speed up Indexing

venkatesh_aamanchi · June 29, 2024, 2:13pm

I have a deployment setup with 2 nodes and a storage of 70GB. I need to index 20 million records. Uploading just these documents is taking me a little over 4 hrs. Is there anyway to speed up the process?

I do not have any complicated mappings. But, text type fields are also categorized as "keyword" field. During the indexing process no other process like reading or writing other indexing happens on this deployment.

I have tried changing the batch size but that doesn't have much impact. Replica shards are also removed as they are not really needed.

Thanks for the help in advance.

leandrojmp · June 29, 2024, 2:21pm

How are you indexing it?

4 hours to index 20 million seems pretty bad.

Also, what is the disk type of your nodes? HDD? SDD?

venkatesh_aamanchi · June 29, 2024, 2:58pm

@leandrojmp Indexing is done through databricks elastic-hadoop connector. I was not able to find out if disk type is SSD or HDD.

leandrojmp · June 29, 2024, 3:07pm

You need to provide more information about it, like how it is really writing, which code is being used, the configuration etc, I'm not familiar with this connector, but maybe someone else can provide more insight.

While the disk type can impact in the performance, 20 million and 70 GB is not that much, it should not take 4 hours.

dadoonet · June 29, 2024, 4:28pm

Also check the mapping as you might index fields multiple times when using default mapping.

Topic		Replies	Views
Slow bulk indexing Elasticsearch	4	2097	July 5, 2017
Bulk indexing Elasticsearch	1	275	July 6, 2017
Index speed degradation Elasticsearch	7	466	July 6, 2017
Rapidly Degrading Bulk Indexing Performance Elasticsearch	7	400	July 6, 2017
Elasticsearch Indexing slows down after having indexed 1000 Documents Elasticsearch	1	384	July 6, 2017

Speed up Indexing

Related topics