IO / Disc "tear down" for elastic search

dliappis · September 29, 2020, 8:13am

This has the wrong challenge name. The challenge's name is shown here

therefore you should use:

esrally --challenge=index-and-query-logs-fixed-daily-volume --track=eventdata --track-repository=eventdata --pipeline=benchmark-only --target-hosts=myip:myport

No it's not (at least not yet), but it's easy to make it parametrizable yourself by simply adding a few jinja2 variables as shown here.

There are described briefly here and here.

If you set id_seq_low_id_bias to true it'll use a different distribution for picking the doc ids to update that heavily favors lower ids (by default it's the uniform distribution). As index size grows, updating the oldest ids creates a heavy workload.

Oleg_Ruchovets · October 13, 2020, 7:08pm

Hello,
I ran the:

esrally --pipeline=benchmark-only --track=eventdata --track-repository=eventdata --challenge=bulk-update --track-params=bulk_size:10000,bulk_indexing_clients:64 --target-hosts=localhost:9200 --client-options="timeout:240" --kill-running-processes

After reach 58% of execution I've got an exception:

**and got **
** java.lang.IllegalArgumentException: number of documents in the index cannot exceed 2147483519

it is actually a Lucene index limitation
**

I ran the single node with a single index. The question, what is the option to overcome this problem? I run a number of benchmarks and I need the final result of Rally execution,elastic process is failed - rally is failed too and there is no output of the Rally execution.

Also a couple of questions about specific rally parameters:
I configured
bulk_size:10000,
bulk_indexing_clients:64

looking at the top command I saw concurrently running max 15 rally processes?
The question - how can I affect on higher load? is it process/core bound - in order to have running 64 rally processes I need at least 64 CPU cores machine?

bulk_size:10000 - making this parameter 50000 for example, will it increase the overall load on the write IO?  Is there a way to generate a bigger document (in case of this challenge I saw JSON document is Nginx log which is not so big)

id_seq_low_id_bias to true - what are the steps - I mean if it is not supported parameters passing what file should I edit and does it affect on the command line to execute the challenge. I saw the example here: https://github.com/elastic/rally-eventdata-track/blob/e23b994e69b7b3ff8b0d386aa3b9a0a37f3e597d/eventdata/challenges/bulk-update.json#L30 but how should I apply it to my local environment?

is this the place to edit the file:

what is the logic behind this challenge: how often updates happened? Is it just writes the first phase and after some threshold starts to update the documents?

What is the best way to create/extend Rally challenges/plugins?

Thanks

system · November 10, 2020, 7:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.