[ESRally Benchmarks] throughput of search changes every time ES restart

I am using rally to benchmark ES, and track is geonames. Every time when ES starts, run several match_all tests to warm up and get best throughput. The throughputs is between 45 and 200 in different test. Here is the environment configuration.

Rally

  • version 1.0.0

operations

{
      "name": "default",
      "operation-type": "search",
      "body": {
        "query": {
          "match_all": {}
        }
      }
},

challenges

      "schedule": [
        {
          "operation": "default",
          "clients": 4,
          "warmup-iterations": 500,
          "iterations": 1000,
          "target-throughput": 210
        }
      ]

Hardware & OS

  • Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz

  • Memory 100G every numa

  • SSD

  • Linux 3.10.0

  • ES binds 4 cores using numactl

JVM

  • -Xms8g
  • -Xmx8g
  • -XX:NewRatio=2
  • -XX:+UseConcMarkSweepGC

ES

  • version 6.2.3
$ curl -XGET 'http://localhost:9200/_cluster/stats?pretty'
{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "elasticsearch",
  "timestamp" : 1553929322953,
  "status" : "green",
  "indices" : {
    "count" : 1,
    "shards" : {
      "total" : 5,
      "primaries" : 5,
      "replication" : 0.0,
      "index" : {
        "shards" : {
          "min" : 5,
          "max" : 5,
          "avg" : 5.0
        },
        "primaries" : {
          "min" : 5,
          "max" : 5,
          "avg" : 5.0
        },
        "replication" : {
          "min" : 0.0,
          "max" : 0.0,
          "avg" : 0.0
        }
      }
    },
    "docs" : {
      "count" : 10320000,
      "deleted" : 0
    },
    "store" : {
      "size_in_bytes" : 2795283680
    },
...

change the bulk size and try.

In this test, anything about bulk size?

Hello,

I am not sure what the question is here. If it is why the query throughput varies between 45 and 200 and is not stable, I'd initially think that simply the benchmark over stresses (some aspect of) the cluster making it unstable.

When performing throughput benchmarks it is highly recommended to not only check the achieved target throughput (obviously if it never gets achieved it's the first red flag making the benchmark invalid and lower throughput rates needs to be used) but equally importantly, the service time and latency. If latency starts growing it indicates that Elasticsearch is slower to service (service_time) your requests than what would be required to satisfy your target throughput.

Rgs,
Dimitris

When throughput is 40, adjust throughput target to 60 and throughput do not grow up.
Here is flame graphs for different throuphput.

Anyboby has idea about this problem?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.