Why are you using such extreme bulk size? How many rally workers are you using? What is the specification of the host rally is running on (or are you running it on the same host as Elasticsearch)? What type of network do you have in place? What indexing throughput are you seeing?
I am using clients as 40 threads.
yes rally and Elasticsearch running on same server.
Index throughput is
All,Min Throughput,index-append-1000-elasticlogs_q_write,967.98,docs/s
All,Median Throughput,index-append-1000-elasticlogs_q_write,180778.04,docs/s
All,Max Throughput,index-append-1000-elasticlogs_q_write,209867.07,docs/s
Can you please suggest what is the ideal number for bulk size?
You typically see indexing throughput increase the bulk size, at least up to a certain level. It then usually flattens out before potentially even starting to decrease. I set the default to 1000 as I saw little gain after that but you could set it a bit higher if you want to. It is generally recommended to try and keep the size of the bulk request below a few MB in size which probably means around 10000 or so events. Also try with an even higher number of clients until you see no further improvement in throughput.
Actually we are using one node cluster .
with above configuration .. i can see that , 10 indices created with 10 shards each..
I was able to see first 1 index will fill with indexing data and then the 2nd index will get create and so on ..
I observed one thing all 10 indices will work in sequence but 10 shards in each index will work in parallel .. it meas totally 10 write threads will be active right at a time ?
How to make the indices work parallel ? or else it suppose to be in this pattern only.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.