I'm trying to do few Performance Benchmark of the current Elasticsearch 6.5.2 (GA) on a Large Single Node Bare metal server. I could see Elasticsearch has some 8 Benchmarks from Geonames to NOAA based data,
I need some pointers if anyone out there could direct me that'd be great.
Out of the 8 benchmarks listed which one is more CPU bound where I could crunch more data on the available Cores/Threads?
How much of Data will be generated from the top CPU bound benchmark out of these 8 benchmarks? I could add more NVMe drives of similar capacity to the Elasticsearch Node if needed.
Currently I'm running the Elasticsearch Benchmarks as it's defaults with the specific "Challenges" associated to it, but I'm not able to raise the CPU usage at all. If someone out there could provide me some tips on adding any additional parameters such as "cars" / any other specific parameters where I could make the benchmarks to do more aggressive CPU bound on the Elasticsearch server.
Currently I'm using ( jdk1.8.0_191 )
A) 24GB as my Heap Size (-Xms24g -Xmx24g)
B) Since this is AMD 7601 32Core processor with HT is switched ON and makes a total of 128 Threads (2 Physical Sockets) in the Elasticsearch Server, so my NUMA Nodes shows as follows where each NUMA Node could slice the memory up to 32GB (256GB / 8 NUMA Nodes)
C) To make sure Java takes care of the NUMA Nodes correctly, I've set -XX:+UseNUMA flag in (jvm.options) file as well.
I totally agree with @Christian_Dahlqvist , it will serve you best if you clarify what is your use-case. You mentioned that you want to see the performance of your processors, but I guess there's something Elasticsearch specific you want to do here, otherwise you'd use one of the standard Linux cpu benchmarking utilities. Some additional in-line answers below:
This depends on the type of operation you want to stress your CPUs with (i.e. bulk indexing or search) and the amount of Rally clients.
For the bulk use case, from the standard tracks nyc_taxis has the largest corpus~74GiB; for each track you can see the amount of uncompressed data in the corresponding section in track.json in the rally-tracks repo. Additionally http_logs has a fair bit of data, albeit per doc size is a bit small.
You can then specify the amount of indexing clients that Rally will use via the bulk_indexing_clients track parameters (again please refer to the each track README file for more details).
If, on the other hand, you want to stress things on the search side, for example for the nyc_taxis track you'll need to tweak the amount of clients, target-throughput and iterations (see here) again via track parameters.
See the above answer on how to check this based on the uncompressed-bytes property in the track.json file of each track.
Note that 1) depending on the number of replicas you may configure for your Elasticsearch indices, the actual bytes can be a multiple of this 2) Lucene will compress data so the actual bytes used by ES won't be the same as the uncompressed size of the track.
See the above answers regarding tuning the bulk indexing clients and/or clients/target-throughput/iterations for queries. Do not omit to check the performance of your load driver server in terms of cpu/disk/network saturation. Finally, if you are using Rally in daemonized mode and launch ES via Rally, depending on your amount of ram you can specify a different car (see https://github.com/elastic/rally-teams/tree/master/cars/v1) e.g. 16gheap or directly set heap_size by passing it in --car-params="heap_size:'16g'".
Hi @Christian_Dahlqvist and @dliappis, Thanks for your responses and the goal/use-case is we 're trying to do a platform centric benchmarking on a dual socket AMD 7601 EPYC processor(s) and find it's capabilities apart from the available Linux cpu benchmarking utilities. We chose elasticsearch as our Search Platform to perform the benchmarks on your given Tracks/Races with Rally. Since these new EPYC processors has more Cores and Threads combined and want to measure the CPU and it's performance in terms of
% of Processor Time during the Search Engine Platform performs various "Challenges".
System Calls / Second (if there are any measurements available through your report)
From your MD report:
In your Track/Race's MD output shows "Median CPU usage" with a percentage for example I got
"Geopoint/append-no-conflicts-index-only" shows as 1193.53% which means Elasticsearch process based on a one second sample period, so I have a total 128 Threads/Cores how it gets translated into 1193.53%, given that this report shows that "Total indexing time" is 25.9801 Minutes. Can you please clarify how the formula of "cpu_utilization_1s says as: CPU usage in percent of the Elasticsearch process based on a one second sample period. The maximum value is N * 100% where N is the number of CPU cores available." how it's got calculated here. This could help as well as this is the only one metric directly pointing to CPU cycle usage on your MD report.
Also I'm looking into your other valuable suggestions on nyc_taxis and multiple replicas (the caveat is I have this one big server (where Elasticsearch Engine Runs) and I do not have another server for replica or Can I use the same server to have multiple replicas? Please advise.)
Currently my heapsize is set to 24GB at the jvm.options file itself.
The Load server (Where Rally Runs) is another Single Socket AMD EPYC 7601 32 Core (Single Socket), so with HT it has 64 Threads and has 256 GB DRAM as well. so when I measured the CPU/MEM/Network/Disk Usage it's very minimal utilization.
Whenever I run the benchmark I clear the following entries as well on the LoadServer
So if you could let me know on the "Median CPU Usage" % how it got derived, which will help me put some thoughts around these 128 Cores/Threads ad any other parameters/events which measures the CPU would be awesome.
Return a float representing the current system-wide CPU utilization as a percentage. When interval is > 0.0 compares system CPU times elapsed before and after the interval (blocking).
So Rally stores the 1s samples (either in its in-memory store or an Elasticsearch metric store, depending on what you configured) per Elasticsearch PID that it launched and at the end calculates the Median cpu usage based on this metric.
My previous answer hopefully covered this. Let me suggest here that instead of relying on Rally's Median CPU reporting alone, I think you'll benefit from installing something like metricbeat on your target node, collect detailed analytics using its system module and benefit from the wealth of information it brings.