Running benchmarking on existing index

Alp1 · January 5, 2018, 6:01pm

Hi

We have few already created indexes(with data) on elastic and i want to get search performance on existing indexes.
As i understand from docs, we can use "auto-managed":false to stop rally creating new index.

I have few questions :

Can we run search operation on existing indexes ?
Do we need to provide mapping.json in 'indices' section always even for existing indexes?
What information should be included/excluded in track.json for such scenario?

Any help is appreciated !!

danielmitterdorfer · January 8, 2018, 7:06am

Hi @Alp1,

if you set auto-managed to false, there is no need for you to provide a mapping file. In fact, you can just remove the entire indices section and just define an index and a type explicitly in the search operation. Here is a minimal example from the docs:

{
  "challenge": {
    "name": "just-search",
    "schedule": [
      {
        "operation": {
          "operation-type": "search",
          "index": "_all",
          "body": {
            "query": {
              "match_all": {}
            }
          }
        },
        "warmup-iterations": 100,
        "iterations": 100,
        "target-throughput": 10
      }
    ]
  }
}

It will run a match_all query against all indices of the target cluster with one client.

To be clear: This is not just a snippet, it is the entire content of your track. You can save this e.g. as search.json and run it with

esrally --pipeline=benchmark-only --target-hosts=node_1_ip:9200,node_2_ip:9200 --track-path=search.json

(this requires that you are on the latest stable version of Rally though which is 0.8.1 at the moment, check with esrally --version).

Alp1 · January 8, 2018, 4:04pm

Thanks It worked.
Can throughput be more than the number specified in track file ? I can see min/max/med throughput as 11 if i specify it as 10 and 105 if i specify it as 100.

danielmitterdorfer · January 8, 2018, 4:53pm

If you specify a target throughput Rally should very closely match it. While it is possible that it is slightly above the target throughput, it should not exceed more than 1 op/s in my experience. Here is an example lower / upper bound from our benchmarking environment where we have executed an operation with a target throughput of 200 operations / s:

min: 200.055
median: 200.092
max: 200.164

But seeing 105 instead of 100 is surprising.

Can you tell me the output of the following?

uname -a
python3 -c "import sys ; print(sys.implementation)"

Also:

Are you seeing this all the time?
Do you have a lot of processes running on that machine so you have a lot of scheduling pressure? E.g. you simulate hundreds of clients with Rally?

Alp1 · January 8, 2018, 5:48pm

Hi @danielmitterdorfer

It seems to be an intermittent issue. I tried just now with throughput as 10 and 100 and got below results. I don't think i had much processes running on machine at that time.

Earlier results -

Here is the output of commands :

uname -a

Linux IP 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

python3 -c "import sys ; print(sys.implementation)"

namespace(_multiarch='x86_64-linux-gnu', cache_tag='cpython-34', hexversion=50594800, name='cpython', version=sys.version_info(major=3, minor=4, micro=3, releaselevel='final', serial=0))

danielmitterdorfer · January 9, 2018, 8:10am

Thanks for the feedback! This might have something to do with the system's timer accuracy. I need to see whether there is a chance I can reproduce this. One last question: Is this a bare-metal machine, running in VM or in the cloud?

Alp1 · January 11, 2018, 2:14am

No..It's not bare -metal instance.

danielmitterdorfer · January 11, 2018, 7:07am

So, is it then running in a VM? Or is it running in a cloud environment (if yes: in which one and ideally also the instance type)? This may help me to reproduce this. Thank you!

Alp1 · January 12, 2018, 7:44pm

It is running in cloud environment and the EC2 instance type is t2.large on ubuntu 14.04. I am using this instance solely for Rally. Let me know if you need more information

danielmitterdorfer · January 15, 2018, 1:59pm

Thanks, that helps! I have raised https://github.com/elastic/rally/issues/393 to track the progress of the respective analysis.

system · February 12, 2018, 2:12pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to benchmark existing cluster with multi-get (API) operation Elasticsearch rally	5	796	April 16, 2019
Benchmark for existing cluster Elasticsearch rally	4	3805	August 15, 2017
Rally for aggregations on existing ES cluster Elasticsearch rally	7	1103	September 19, 2019
Can elastic/rally point to existing ES configurations for benchmarking Elasticsearch rally	10	3717	January 10, 2017
Rally Benchmarking in Test Cluster-Download Elasticsearch rally	6	577	March 20, 2019

Running benchmarking on existing index

Related topics