Updating 100,000 records at once takes 30 minutes to complete
2 Elastic VMs 16VCPU 60GB RAM each 1 TB SSD each 10G connection between nodes Xms Xmx set to 24G "refresh_interval": "30s", "number_of_shards": "5", "number_of_replicas": "1", Index_buffer_size =30% Memory_lock =true
1.8M records 1,030 field mappings per record 80kb average record size
Things we've tried but doesn't work:
- Using the Bulk API for all 100,000 records
- Using the Bulk API at increments of 500, 1,000, 10,000
- Adjusting refresh interval from 1s to 1 minute (5s, 15s, 30s)
- update_by_query - with slices and no slices (this is slower than the Bulk API)
- Is there any amount of tuning that can fix these performance issues? Or do we have to split the index out into read-only and dynamic field mappings?
- We are enabling the source field in order to allow dynamic updates - some people have suggested turning this off but it's not an option for us.
- We have gone through the Elasticsearch documentation and followed all of the tuning steps