I have to store some 60 million+ documents in elasticsearch. I am using bulkprocessor in JAVA api with Transport Client in Elasticsearch 5.3.1. I have tried indexing fixed number of fields (around 200) and able to achieve a speed of 10k documents/second.
The challenge i am facing is that there are many unique fields which are dynamic (around 3-4 lakhs) in my data. I have done following till now:
- Used different bulk sizes. (from 100-1000).
- Used varying number of threads (2 to 15).
- Increased "indices.memory.index_buffer_size" from 10% to 25% of java heap memory.
- Increasing "thread_pool.bulk.queue_size" to 5000.
- I am using refresh_interval of 30sec.
- JVM heap memory for each node is 18 GB out of 32 GB.
- I have kept number of replicas as '1'. The reason is if JVM heap size overshoots and nodes go down then there will not be an issue in bringing cluster up as there will be data replication present in other nodes.
- I am clearing cache every 10 minutes.
- I tried setting "indices.store.throttle.type":"none", so that
segmentation and merging wont happen at index time, but still not seeing
any increase in index speed.
I am only able to see an index speed of 100 documents / second. Need someone to guide further on this.