If you set id_seq_low_id_bias to true it'll use a different distribution for picking the doc ids to update that heavily favors lower ids (by default it's the uniform distribution). As index size grows, updating the oldest ids creates a heavy workload.
After reach 58% of execution I've got an exception:
**and got **
** java.lang.IllegalArgumentException: number of documents in the index cannot exceed 2147483519
it is actually a Lucene index limitation
**
I ran the single node with a single index. The question, what is the option to overcome this problem? I run a number of benchmarks and I need the final result of Rally execution,elastic process is failed - rally is failed too and there is no output of the Rally execution.
Also a couple of questions about specific rally parameters:
I configured
bulk_size:10000,
bulk_indexing_clients:64
looking at the top command I saw concurrently running max 15 rally processes?
The question - how can I affect on higher load? is it process/core bound - in order to have running 64 rally processes I need at least 64 CPU cores machine?
bulk_size:10000 - making this parameter 50000 for example, will it increase the overall load on the write IO? Is there a way to generate a bigger document (in case of this challenge I saw JSON document is Nginx log which is not so big)
what is the logic behind this challenge: how often updates happened? Is it just writes the first phase and after some threshold starts to update the documents?
What is the best way to create/extend Rally challenges/plugins?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.