Configuration is 6 x Elasticsearch nodes each with 16GB dedicated memory.
Each node is 8 processor intel linux server
There are 6 clients running locally on each node (localhost) each running
elasticsearch-py helper.bulk in turn spawning 8 client processes (48
processes total).
The index.store.type is memory
refresh_interval 120s
threadpool.bulk.queue_size is 200
Marvel reports up to 80,000 records per second index rate.
But in practice the net records per second taking the 40minutes is more
like 30,000 records/s
Given the hardware my question is: is this good or should I expect faster?
And what can be done to increase through-put?
Throwing more clients at the server does seem to drive up performance...
but how to measure what is the bottleneck?
Should I be concerned that the IOps reported by marvel on the cluster
summary is
1: 344
2: 466
3: 246
4: 261
5: 162
6: 93
Configuration is 6 x Elasticsearch nodes each with 16GB dedicated memory.
Each node is 8 processor intel linux server
There are 6 clients running locally on each node (localhost) each running
elasticsearch-py helper.bulk in turn spawning 8 client processes (48
processes total).
The index.store.type is memory
refresh_interval 120s
threadpool.bulk.queue_size is 200
Marvel reports up to 80,000 records per second index rate.
But in practice the net records per second taking the 40minutes is more
like 30,000 records/s
Given the hardware my question is: is this good or should I expect faster?
And what can be done to increase through-put?
Throwing more clients at the server does seem to drive up performance...
but how to measure what is the bottleneck?
Should I be concerned that the IOps reported by marvel on the cluster
summary is
1: 344
2: 466
3: 246
4: 261
5: 162
6: 93
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.