The error rate is 100%

luolinsun · September 28, 2017, 12:45pm

I use the geonames and the challenge is append-no-conflicts-index-only. the command:
./esrally race --offline --track=geonames --pipeline=benchmark-only --target-hosts=10.202.7.169:9200,10.202.7.170:9200,10.202.7.171:9200 --challenge=append-no-conflicts-index-only
But, the result about the index-append is:
| All | Min Throughput | index-append | 927.67 | docs/s |
| All | Median Throughput | index-append | 14590.5 | docs/s |
| All | Max Throughput | index-append | 14890.2 | docs/s |
| All | 50th percentile latency | index-append | 51.2863 | ms |
| All | 90th percentile latency | index-append | 65.4431 | ms |
| All | 99th percentile latency | index-append | 78.9863 | ms |
| All | 99.9th percentile latency | index-append | 115.625 | ms |
| All | 100th percentile latency | index-append | 452.019 | ms |
| All | 50th percentile service time | index-append | 51.2863 | ms |
| All | 90th percentile service time | index-append | 65.4431 | ms |
| All | 99th percentile service time | index-append | 78.9863 | ms |
| All | 99.9th percentile service time | index-append | 115.625 | ms |
| All | 100th percentile service time | index-append | 452.019 | ms |
| All | error rate | index-append | 100 | % |
I don't know the why the error rate is 100%

danielmitterdorfer · September 28, 2017, 1:29pm

Hi @luolinsun,

this problem is very likely environment specific. Can you please share the corresponding log file (Rally prints the path to the log file at the beginning: "Writing logs to ~/.rally/logs/rally_out_SOME_TIMESTAMP_HERE.log")

Can you please also try to run a small subset of just 1000 documents by specifying --test-mode in addition to the other command line parameters? Does it still report an error rate of 100%?

Another test would be to run it against your local machine (for test purposes) with:

./esrally --offline --track=geonames --distribution-version=5.6.0 --challenge=append-no-conflicts-index-only

This requires that Java 8 is installed on your local machine (as Elasticsearch is started by Rally on the local machine in the background).

Daniel

luolinsun · September 29, 2017, 2:32am

Thanks,I try a small subset and a local machine with es5.4, the error rate also is 100%. From the log I can't find the reason. There are some logs:
2017-09-29 01:50:29,956 PID:3849 rally.driver INFO Main driver has notified all load generators of termination.
2017-09-29 01:50:29,979 PID:3838 rally.racecontrol INFO BenchmarkActor received unknown message [ChildActorExited:ActorAddr-(T|:36732)] (ignoring).
2017-09-29 01:50:29,980 PID:3839 rally.telemetry INFO Gathering indices stats.
2017-09-29 01:50:30,7 PID:3839 rally.metrics INFO Compression changed size of metric store from [264] bytes to [1264] bytes
2017-09-29 01:50:30,8 PID:3838 rally.racecontrol INFO Bulk adding system metrics to metrics store.
2017-09-29 01:50:30,8 PID:3838 rally.metrics INFO Restoring in-memory representation of metrics store.
2017-09-29 01:50:30,8 PID:3838 rally.racecontrol INFO Flushing metrics data...
2017-09-29 01:50:30,8 PID:3838 rally.racecontrol INFO Flushing done
2017-09-29 01:50:30,8 PID:3838 rally.racecontrol INFO Finished lap [1/1]
2017-09-29 01:50:30,9 PID:3838 rally.racecontrol INFO Asking mechanic to stop the engine.
2017-09-29 01:50:30,9 PID:3839 rally.actor INFO Transitioning from [benchmark_stopped] to [cluster_stopping].
2017-09-29 01:50:30,10 PID:3844 rally.mechanic INFO Stopping nodes [<esrally.mechanic.cluster.Node object at 0x7f71f17d86a0>, <esrally.mechanic.cluster.Node object at 0x7f71f17d8668>, <esrally.mechanic.cluster.Node object at 0x7f71f17d8048>].
2017-09-29 01:50:30,10 PID:3844 rally.metrics INFO Compression changed size of metric store from [64] bytes to [47] bytes
2017-09-29 01:50:30,10 PID:3839 rally.metrics INFO Restoring in-memory representation of metrics store.
2017-09-29 01:50:30,10 PID:3839 rally.actor INFO [1] of [1] child actors have responded for transition from [cluster_stopping] to [cluster_stopped].
2017-09-29 01:50:30,10 PID:3839 rally.actor INFO All [1] child actors have responded. Transitioning now from [cluster_stopping] to [cluster_stopped].
2017-09-29 01:50:30,11 PID:3839 rally.metrics INFO Compression changed size of metric store from [64] bytes to [47] bytes
2017-09-29 01:50:30,11 PID:3838 rally.racecontrol INFO Mechanic has stopped engine successfully.
2017-09-29 01:50:30,11 PID:3838 rally.racecontrol INFO Bulk adding system metrics to metrics store.
2017-09-29 01:50:30,11 PID:3838 rally.metrics INFO Restoring in-memory representation of metrics store.
2017-09-29 01:50:30,23 PID:3838 rally.reporting INFO Summarizing results.
2017-09-29 01:50:30,23 PID:3838 rally.reporting INFO
2017-09-29 01:50:30,23 PID:3838 rally.reporting INFO e[1m------------------------------------------------------e[0m
2017-09-29 01:50:30,23 PID:3838 rally.reporting INFO e[1m _______ __ _____ e[0m
2017-09-29 01:50:30,23 PID:3838 rally.reporting INFO e[1m / () ____ / / / /_____ ________ e[0m
2017-09-29 01:50:30,23 PID:3838 rally.reporting INFO e[1m / /_ / / __ / __ `/ / __ / / __ / / _ \e[0m
2017-09-29 01:50:30,23 PID:3838 rally.reporting INFO e[1m / __/ / / / / / // / / / / // // / / / __/e[0m
2017-09-29 01:50:30,23 PID:3838 rally.reporting INFO e[1m// /// //_,// /__/_/_// ___/ e[0m
2017-09-29 01:50:30,23 PID:3838 rally.reporting INFO e[1m------------------------------------------------------e[0m
2017-09-29 01:50:30,23 PID:3838 rally.reporting INFO
2017-09-29 01:50:30,26 PID:3838 rally.reporting INFO | Lap | Metric | Operation | Value | Unit |
|------:|-------------------------------:|-------------:|-----------:|-------
| All | Total Young Gen GC | | 0.272 | s |
| All | Total Old Gen GC | | 0 | s |
| All | Heap used for segments | | 0.420527 | MB |
| All | Heap used for doc values | | 0.0634766 | MB |
| All | Heap used for terms | | 0.330464 | MB |
| All | Heap used for norms | | 0.00128174 | MB |
| All | Heap used for points | | 0.00135517 | MB |
| All | Heap used for stored fields | | 0.0239487 | MB |
| All | Segment count | | 80 | |
| All | Min Throughput | index-append | 1234.81 | docs/s |
| All | Median Throughput | index-append | 11730.3 | docs/s |
| All | Max Throughput | index-append | 13368.1 | docs/s |
| All | 50th percentile latency | index-append | 51.3351 | ms |
| All | 90th percentile latency | index-append | 67.9726 | ms |
| All | 99th percentile latency | index-append | 98.6408 | ms |
| All | 99.9th percentile latency | index-append | 687.572 | ms |
| All | 100th percentile latency | index-append | 748.216 | ms |
| All | 50th percentile service time | index-append | 51.3351 | ms |
| All | 90th percentile service time | index-append | 67.9726 | ms |
| All | 99th percentile service time | index-append | 98.6408 | ms |
| All | 99.9th percentile service time | index-append | 687.572 | ms |
| All | 100th percentile service time | index-append | 748.216 | ms |
| All | error rate | index-append | 100 | % |
| All | Min Throughput | force-merge | 114.74 | ops/s |
| All | Median Throughput | force-merge | 114.74 | ops/s |
| All | Max Throughput | force-merge | 114.74 | ops/s |
| All | 100th percentile latency | force-merge | 8.69539 | ms |
| All | 100th percentile service time | force-merge | 8.69539 | ms |
| All | error rate | force-merge | 0 | % |

2017-09-29 01:50:30,28 PID:3838 rally.metrics INFO Closing metrics store.
2017-09-29 01:50:30,30 PID:3812 rally.racecontrol INFO Benchmark has finished successfully.
2017-09-29 01:50:30,30 PID:3812 rally.racecontrol INFO Telling benchmark actor to exit.
2017-09-29 01:50:30,30 PID:3838 rally.racecontrol INFO BenchmarkActor received unknown message [ChildActorExited:ActorAddr-(T|:39664)] (ignoring).
2017-09-29 01:50:30,31 PID:3812 rally.main INFO Attempting to shutdown internal actor system.
2017-09-29 01:50:30,37 PID:3812 rally.main INFO Actor system is still running. Waiting...
2017-09-29 01:50:30,37 PID:3836 root INFO ---- Actor System shutdown
2017-09-29 01:50:31,39 PID:3812 rally.main INFO Shutdown completed.

danielmitterdorfer · September 29, 2017, 8:38am

Hi @luolinsun,

that leaves two possibilities to me: (1) Either your data set is corrupted or (2) the machine does not have enough resources?

Regarding (1):

Can you please do ls -l ~/.rally/benchmarks/data/geonames and check whether you see the same file sizes? Rally should verify the file sizes but I just want to double-check:

total 7445944
-rw-r--r--  1 daniel  staff  3547614383 Sep 29 10:38 documents-2.json
-rw-r--r--  1 daniel  staff   264698741 Sep 29 10:37 documents-2.json.bz2
-rw-r--r--  1 daniel  staff        4250 Sep 29 10:39 documents-2.json.offset

If not, then please delete all files in that folder and rerun Rally. It should download and uncompress the files again (but not if you specify --offline).

Regarding (2):

Are you on a machine with a spinning disk instead of an SSD?
Can you try to rerun the benchmark with an increased heap size, e.g. --car=4gheap?

You could also check whether Elasticsearch logs in ~/.rally/benchmarks/races/YOUR_RACE_TIMESTAMP/rally-node-0/logs/server reveal anything interesting.

Daniel

sanketshinde · October 16, 2017, 3:02pm

I am getting the same error: Like i posted in my post : Premature end of Benchmark run

Can't seem to figure this out.

system · November 13, 2017, 3:02pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
No Throughput result in the summary report Elasticsearch rally	9	1958	March 31, 2017
Esrally test results, I want to see the test results about the query, thank you Elasticsearch	1	392	May 26, 2021
Does esrally cheating?Big difference between the two measurements Elasticsearch rally	5	591	February 13, 2020
100 prosent error rate on Rally when using parent-child Elasticsearch rally	4	748	September 22, 2020
Error while running esrally for benchmarking es Elasticsearch rally	3	787	March 22, 2022

The error rate is 100%

Related topics