Dear forum members,
In last two months I was working on testing elasticsearch indexing.
I have a minimum requirement to index about 50K json documents per second.
My document size is about 2-4K and has several nested defined.
I have the following configuration:
- ElasticSearch v5.0.
- 3 Virtual machines with 32 gb RAM (16gb as heap).
- 8 cores
- Redhat 7.2 64 bit
- Basic configuration options as in best practices with memlock defined
I have tried the following approach:
- Indexing with bulk api using python elastic library
- Indexing without replica and then turning it on
- Changing the number of shards as number of cores in cluster
- Putting the data on fast SSD disks pool
- Trying to index in parallel with 5 several threads to different nodes
With all this I could not reach above 10K/sec.
I also tried physical machines but still without noticeable result.
I have assumed that adding nodes may solve the issue as scale out solution.
I have added 3 more nodes and then even 3 more, yet the was no real impact on indexing speed.
My question for forum members,
In your experience does the scale out work for indexing?
Please share your examples with sizing of nodes and hardware specs.
Thanks in advance,