IO / Disc "tear down" for elastic search

Hello,
My task is to make an elastic search disk/io "hard life" :-). To not reinventing the wheel - is there a tool for simulation high disk load on elastic?
what type of query is the most disk/io intensive? I saw many articles and the best practices on how to tune performance but my task is really to kill disk/io for elastic and I didn't find any useful open information on the web.

I am testing Rally - the highest load I've got so far is 12% iowait and 1.2k writes per second
I used a default dataset to run load agains single node cluster:
here is the command:

esrally --track=pmc --track-params="bulk_size:2000,bulk_indexing_clients:16" --target-hosts=<remote_ip>:9200 --pipeline=benchmark-only

What is the way to cause the disk saturation using Rally?

Thank you in advance.

Hey,

Thanks for your interest.

To create high disk load with Rally, obviously you could try expensive queries (plus parallel indexing) but another angle is to rely on ingest with with large % of document updates.

You could take a look at a blog article I co-authored where we used Rally to simulate high IO (to assist with a kernel bug investigation): https://www.elastic.co/blog/canonical-elastic-and-google-team-up-to-prevent-data-corruption-in-linux

In the reproduction scripts linked there you'll see we are relying on a bulk-update challenge; if you additionally set id_seq_low_id_bias to true and tweak the probability too you are going to end up with a very disk io heavy workload.

You can also take a look at the daily-log-volume-index-and-query eventdata challenge (see the docs/parameters here: https://github.com/elastic/rally-eventdata-track#index-and-query-logs-fixed-daily-volume) and bump the # of search clients and other parameters.

Dimitris

2 Likes

Hi @dliappis. Really informative, thank you.

I am trying to follow the recommendations:
checkout repositories:
drwxrwxr-x 16 user user 4096 Sep 9 15:25 rally-tracks
drwxrwxr-x 5 user user 4096 Sep 9 15:28 rally-eventdata-track

running command:
esrally --track-path=/add_disk/code/elasticsearch/rally-tracks/eventdata --target-hosts=

The process is got stuck on
[INFO] Preparing for race ... , logs are not updated and CPU is almost 0, kind of nothing is happening.
I am very new to the Rally and need some guidance on how to troubleshoot.

Thanks in advance.

FYI for the eventdata track, if you don't intend on doing changes on your own, you can simply use it via --track-repository with more detailed docs here. It is as simple as editing your ~/.rally/rally.ini and adding in the [tracks] section:


[tracks]
default.url = https://github.com/elastic/rally-tracks
eventdata.url = https://github.com/elastic/rally-eventdata-track

Your esrally command seems incomplete. Which challenge do you intend to use? Also which Rally version are you using esrally --version?

version is 1.4.1

I also tried to run after rally.ini edit:

[tracks]
#default.url = https://github.com/elastic/rally-tracks
default.url = https://github.com/elastic/rally-tracks
eventdata.url = https://github.com/elastic/rally-eventdata-track

using the command:
esrally --track=eventdata --track-repository=eventdata --target-hosts=my_ip:9200

based on your recommendation I need to set id_seq_low_id_bias to true and tweak the probability, so I will need to make a changes, right?

What is the fastest /easiest way to run the disk heavy load explained in your post above?

Thanks

You were right, command was not completed, added --pipeline=benchmark-only and after can run the job.

I am running the command:
esrally --track=eventdata --track-repository=eventdata --track-params=bulk_size:10000,bulk_indexing_clients:16 --target-hosts=my_ip:9200 --pipeline=benchmark-only

Running deleteindex_elasticlogs_q-* [100% done]
Running delete-index-template [100% done]
Running create-index-template [100% done]
Running create-index [100% done]
Running index-append-1000-elasticlogs_q_write,rollover_elasticlogs_q_write_100M[ 4% done]

the load is very low:
avg-cpu: %user %nice %system %iowait %steal %idle
78.96 0.00 3.58 0.69 0.00 16.77

Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sdb 0.00 1001.00 0.00 79.86 0.00 1937.00 0.00 65.93 0.00 1.00 0.60 0.00 81.69 0.26 26.40
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

append is not harm too much to a disk :slight_smile:

What is the way to configure and run an update or other operations to make disk saturation?

Thanks in advance.

You didn't specify the challenge; the eventdata tracks has several challenges and by not specifying a challenge it picked the default elasticlogs-1bn-load.

You should specify the bulk-update challenge if you want to take the blog article path (see how it's invoked here) or the index-and-query-logs-fixed-daily-volume if you want to use the other option I talked about earlier. In any case, please take a moment to go through the available challenges docs of the eventdata track.

You can also like all challenges in the track using e.g. esrally info --track-repo=eventdata --track=eventdata.

1 Like

For tweaks, it's best to reference the track with --track-path=<path_to_your_modified_checkout_of_eventdata_repo> as you mentioned earlier instead of using --track-repo; but I suggest you start with the challenge as is first and if it works, then you can proceed with tweaks.

Hello, I've made changes to the execution command.

esrally --track-params=./track-params.json --challenge=bulk-update --track=eventdata --track-repository=eventdata --pipeline=benchmark-only --target-hosts=<my_ip>:9200

track-params.json
{
"bulk_indexing_clients": 8,
"bulk_indexing_iterations": 2200000,
"target_throughput": 16,
"bulk_size": 10000
}

I tried to change bulk_size from 1000 to 10000

in both cases, I've got around 300 WPC and iowait 1.6%

here is the output of iostat -xm 1 /dev/sdb /dev/sda

Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 0.00 3.00 0.00 0.03 0.00 1.00 0.00 25.00 0.00 0.00 0.00 0.00 9.33 0.00 0.00
sdb 2.00 221.00 0.01 38.62 0.00 252.00 0.00 53.28 2.00 3.76 0.75 4.00 178.93 0.23 5.20

I read the challenges list details also.
I definitely miss something, can you please point me on what I am doing wrong?

Thanks in advance.

How much memory does the host have? How large portion of the index would be cached in the os page cache? To properly stress disk I/O you want to make sure as little as possible is cached.

Hello @Christian_Dahlqvist.
Yes, it is a good point.

ES machine is:
Machine type

c2-standard-16 (16 vCPUs, 64 GB memory)

Reservation

Automatically choose

CPU platform

Intel Cascade Lake

I configure
thread_pool:
write:
size: 16
queue_size: 50000

JVM options
-Xms1g
-Xmx32g

What other changes should I do? How to minimize the index caching?

Thanks in advance.

The specific challenge will get slower over time as it needs to build up a reasonable index size and then the more updates go to older docs the heavier on IO it becomes; for the reproduction mentioned on the blog article it took >24hrs. I am not sure how long you ran it.

If you are looking for something quicker, perhaps you should consider the other challenge I mentioned earlier, daily-log-volume-index-and-query.

1 Like

A few more things I forgot to mention:

  • Check the refresh interval for the index; by making it more frequent the disk IO penalty will be higher (see here

  • I observed you are using a different value for Xms and Xms; this might make things slower during the execution of the benchmark, but I thought I'd point you to this topic on why on production cases at least you should strive to keep those identical (plus using -XX:+AlwaysPreTouch).

1 Like

Thank you for such a detailed explanation. I disabled the OS paging cache ( I am not sure is it a realistic scenario ), fixed the JVM heap size (Xmx Xms is the same).
I also reduced ES memory from 32GB to 8GB (I am trying to prevent the dataset from being cached / hold it in memory. Is there a way to disable caching for a write operation or disable any kind of cache for ES)

I run many options, rally data sets - the maximum I can squeeze is 3500 writes per second and very rare iowait ~12%. Writing thruput is 40-45 Mbps (very strange by the way - 16 processes or rally writing and data is only 40 Mbps)

I didn't run 24 hours as you recommended ( maybe later ). I want to try running multiple instance client and run parallel indexes. Is it a good idea and may you recommend the best / fastest way to come up with multiple nodes running Rally.

Thanks in advance.

Hello ,
Trying to run daily-log-volume-index-and-query

esrally --track-params=./track-params.json --challenge=daily-log-volume-index-and-query --track=eventdata --track-repository=eventdata --pipeline=benchmark-only --target-hosts=MyIp:9200

[ERROR] Cannot race. Error in race control (('Unknown challenge [daily-log-volume-index-and-query] for track [eventdata]', None))

rally.init
[tracks]
#default.url = https://github.com/elastic/rally-tracks
eventdata.url = https://github.com/elastic/rally-eventdata-track
default.url = https://github.com/elastic/rally-tracks

What could be a problem?

This is documented in https://esrally.readthedocs.io/en/stable/recipes.html#distributing-the-load-test-driver; you should check if your load driver is bottlenecked before considering clustering it though.

The name of the challenge is index-and-query-logs-fixed-daily-volume

Ok, Thank you. I will rerun the rally with a proper challenge name.

What I really can't understand. I started to test with 1 ES and 1 Rally machine and measure disk/io i iostat unitiliy.

I've got the load:
avg-cpu: %user %nice %system %iowait %steal %idle
9.23 0.00 0.99 9.29 0.00 80.48

Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 3845.00 0.00 42.13 0.00 5596.00 0.00 59.27 0.00 1.70 5.72 0.00 11.22 0.22 86.00

iowait - ~ 10%
w/s - 3500 (after disabling page cache and the rest of the changes)
writes ~ 40Mbs

Today I added a second machine with Rally and run. I am expecting a higher load on ES, in particular IO/ Disk.

But the results are almost the same:
iowait - ~ 10%
w/s - 3500 (after disabling page cache and the rest of the changes)
writes ~ 40Mbs

Question - how this can be? I am following elastic search ui. I am adding millions of documents but the load on IO / Disk is the same?

Thanks in advance.

Are you checking for errors during indexing? (this can be seen in the metrics store or you can ask Rally to fail immediately when you hit an error with -on-error=-abort).

I suggest you take a step back and think this a bit through; reading the 7 deadly sins of benchmarking is strongly recommended for avoiding common pitfalls. As mentioned in that talk, the USE method is a good methodology for identifying bottlenecks on the Loaddriver and the Elasticsearch side.

If you've made sure there are no bottlenecks you start tweaking parameters, one at a time.

For the record I tried smoke testing the bulk-update challenge with similar settings to the ones you used earlier (esrally --pipeline=benchmark-only --track=eventdata --track-repository=eventdata --challenge=bulk-update --track-params=bulk_size:10000,bulk_indexing_clients:16 --target-hosts=localhost:9200 --client-options="timeout:240" --kill-running-processes) on a powerful machine (with Rally running on the same host, which ofc is a BIG anti-pattern for performance analysis) and after a few minutes I could easily get IO throughput like 688MB/s:

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
md0              0.00      0.00     0.00   0.00    0.00     0.00 3131.00 705372.00     0.00   0.00    0.00   225.29    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
nvme0n1          0.00      0.00     0.00   0.00    0.00     0.00 1981.00 275372.00     0.00   0.00    0.90   139.01    0.00      0.00     0.00   0.00    0.00     0.00  570.00    0.63    2.15  97.30
nvme1n1          0.00      0.00     0.00   0.00    0.00     0.00 1942.00 274404.00     0.00   0.00    1.02   141.30    0.00      0.00     0.00   0.00    0.00     0.00  570.00    0.65    2.34  97.30

however, the system seems to be actually mostly CPU bound:

$ mpstat 1
Linux 5.8.6-1-MANJARO (ryzen) 	09/15/2020 	_x86_64_	(32 CPU)

09:30:21 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
09:30:22 PM  all   83.47    0.00    2.15    0.22    0.28    0.06    0.00    0.00    0.00   13.81
09:30:23 PM  all   81.20    0.00    3.47    0.19    0.53    0.09    0.00    0.00    0.00   14.52
09:30:24 PM  all   84.39    0.00    2.16    0.06    0.41    0.00    0.00    0.00    0.00   12.98
09:30:25 PM  all   85.72    0.00    2.20    0.16    0.16    0.09    0.00    0.00    0.00   11.68
09:30:26 PM  all   85.77    0.00    2.09    0.03    0.66    0.16    0.00    0.00    0.00   11.30
09:30:27 PM  all   85.85    0.00    1.90    0.00    0.22    0.03    0.00    0.00    0.00   11.99
09:30:28 PM  all   85.99    0.00    1.85    0.03    0.28    0.13    0.00    0.00    0.00   11.71

which tells me that for the specific system going so high with bulk_size / bulk indexing clients makes ES become CPU bound rather than IO bound.

A combined workload including search like the one in index-and-query-logs-fixed-daily-volume would probably be more IO intensive sooner, but please make sure you have a sound methodology and can identify all bottlenecks.

2 Likes

Yeah, I saw the video on youtube about 7 sins for benchmarking. (that is why I was surprised why disabling OS page cache is a good idea).
I definitely agree with you about the methodology. I just feel I got stuck with something very basic.

I am rerunning the command you sent:
esrally --pipeline=benchmark-only --track=eventdata --track-repository=eventdata --challenge=index-and-query-logs-fixed-daily-v olume --track-params=bulk_size:10000,bulk_indexing_clients:16 --target-hosts=my_ip:9200 --client-options="timeout:240" --on-error=abor

I am not running on the same machine ( as you mentioned it is BIG Antipattern).
Running on Google cloud:

ES is 16 vCore , 64 ram 1TB disk.
Rally client machine 8vCore 32GB 1TB disk.

I've got exactly the same statistics:
avg-cpu: %user %nice %system %iowait %steal %idle
8.77 0.00 1.18 5.10 0.00 84.95

Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 3803.00 0.00 41.84 0.00 5772.00 0.00 60.28 0.00 0.63 1.55 0.00 11.27 0.19 71.20

iowait - 5.10
MBs - 41.84
w/s - 3803.00

being on execution ~ 30% I've got an error and job is aborted
[ERROR] Cannot race. Error in load generator [2]
No matching data found for field '@timestamp' in pattern 'elasticlogs-*'.

Getting further help:


log files have many exceptions:
AssertionError: No matching data found for field '@timestamp' in pattern 'elasticlogs-*'.

2020-09-15 19:17:25,798 ActorAddr-(T|:44165)/PID:12740 esrally.driver.driver ERROR Could not execute schedule
Traceback (most recent call last):

File "/usr/local/lib/python3.6/dist-packages/esrally/driver/driver.py", line 1095, in call
total_ops, total_ops_unit, request_meta_data = execute_single(runner, self.es, params, self.abort_on_error)

File "/usr/local/lib/python3.6/dist-packages/esrally/driver/driver.py", line 1133, in execute_single
return_value = runner(es, params)

File "/usr/local/lib/python3.6/dist-packages/esrally/driver/runner.py", line 188, in call
return self.delegate(*args)

File "/usr/local/lib/python3.6/dist-packages/esrally/driver/runner.py", line 236, in call
return self.delegate(self.client_extractor(args[0]), *args[1:])

File "/home/user/.rally/benchmarks/tracks/eventdata/eventdata/runners/fieldstats_runner.py", line 146, in fieldstats
raise AssertionError("No matching data found for field '{}' in pattern '{}'.".format(field_name, index_pattern))

AssertionError: No matching data found for field '@timestamp' in pattern 'elasticlogs-*'.

Any Ideas why what is wrong?

Given that you seem to have relatively little CPU usage I would recommend trying to run Rally on the host itself and see what this gives. It will rule out network performance being the bottleneck.