Esrally stuck Running large_terms

Hi,
I was running esrally to test the performance of an elasticsearch cluster of version 5.6.10. Esrally version is 1.0.0. It stucks when running large_terms at percentage 3% for a very long time( more than 4 hours). Here's the logs:

[INFO] Racing on track [geonames], challenge [append-no-conflicts] and car ['external'] with version [5.6.10].

[WARNING] merges_total_time is 15064528 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] merges_total_throttled_time is 114305 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] indexing_total_time is 3101013 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] refresh_total_time is 3659868 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
[WARNING] flush_total_time is 15527 ms indicating that the cluster is not in a defined clean state. Recorded index time metrics may be misleading.
Running delete-index [100% done]
Running create-index [100% done]
Running check-cluster-health [100% done]
Running index-append [100% done]
Running refresh-after-index [100% done]
Running force-merge [100% done]
Running refresh-after-force-merge [100% done]
Running index-stats [100% done]
Running node-stats [100% done]
Running default [100% done]
Running term [100% done]
Running phrase [100% done]
Running country_agg_uncached [100% done]
Running country_agg_cached [100% done]
Running scroll [100% done]
Running expression [100% done]
Running painless_static [100% done]
Running painless_dynamic [100% done]
Running large_terms [ 3% done]

Any idea about this?

Thanks

Hi,

is this reproducible? Did you check the Elasticsearch logs of the cluster that you are targeting with your benchmark? Did you check the Rally logs in ~/.rally/logs/rally.log?

Daniel

Maybe it's because I'm using the default jvm options. I've increased Xms and Xmx and try again.

I think I know what's going on (or rather not going on...). The large terms task intentionally sends a very large number of terms to Elasticsearch and you are benchmarking a 5.x cluster with the benchmark-only pipeline. The default limit for indices.query.bool.max_clause_count is 1024 but the benchmark needs a limit of 50000. As this is a static cluster setting you need to add it yourself to elasticsearch.yml, i.e.:

indices.query.bool.max_clause_count: 50000

If you have Rally setup the cluster for you, it would do it but there is no chance we can set it for you with the benchmark-only pipeline. I think we emit a warning in the logs but we should probably be more vocal about it and show a warning that you need to ensure you set it yourself.

Can you please change elasticsearch.yml on each node accordingly, restart and retry the benchmark?

I've added this parameters then tested again, and esrally completed the test successfully this time!
Thank you very much :slight_smile:

I missed the comment, thanks for the remind. I'll check the logs more carefully next time.

Hi,

thanks for your feedback and glad it is resolved now. This behavior is really trappy and we will improve it. I raised https://github.com/elastic/rally/issues/541 so in the future Rally will show a clear and actionable warning message to avoid such problems.

Daniel

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.