Running benchmarking on existing index


#1

Hi

We have few already created indexes(with data) on elastic and i want to get search performance on existing indexes.
As i understand from docs, we can use "auto-managed":false to stop rally creating new index.

I have few questions :

  1. Can we run search operation on existing indexes ?
  2. Do we need to provide mapping.json in 'indices' section always even for existing indexes?
  3. What information should be included/excluded in track.json for such scenario?

Any help is appreciated !!


(Daniel Mitterdorfer) #2

Hi @Alp1,

if you set auto-managed to false, there is no need for you to provide a mapping file. In fact, you can just remove the entire indices section and just define an index and a type explicitly in the search operation. Here is a minimal example from the docs:

{
  "challenge": {
    "name": "just-search",
    "schedule": [
      {
        "operation": {
          "operation-type": "search",
          "index": "_all",
          "body": {
            "query": {
              "match_all": {}
            }
          }
        },
        "warmup-iterations": 100,
        "iterations": 100,
        "target-throughput": 10
      }
    ]
  }
}

It will run a match_all query against all indices of the target cluster with one client.

To be clear: This is not just a snippet, it is the entire content of your track. You can save this e.g. as search.json and run it with

esrally --pipeline=benchmark-only --target-hosts=node_1_ip:9200,node_2_ip:9200 --track-path=search.json

(this requires that you are on the latest stable version of Rally though which is 0.8.1 at the moment, check with esrally --version).


#3

Thanks :slight_smile: It worked.
Can throughput be more than the number specified in track file ? I can see min/max/med throughput as 11 if i specify it as 10 and 105 if i specify it as 100.


(Daniel Mitterdorfer) #4

If you specify a target throughput Rally should very closely match it. While it is possible that it is slightly above the target throughput, it should not exceed more than 1 op/s in my experience. Here is an example lower / upper bound from our benchmarking environment where we have executed an operation with a target throughput of 200 operations / s:

  • min: 200.055
  • median: 200.092
  • max: 200.164

But seeing 105 instead of 100 is surprising.

Can you tell me the output of the following?

  • uname -a
  • python3 -c "import sys ; print(sys.implementation)"

Also:

  • Are you seeing this all the time?
  • Do you have a lot of processes running on that machine so you have a lot of scheduling pressure? E.g. you simulate hundreds of clients with Rally?

#5

Hi @danielmitterdorfer

It seems to be an intermittent issue. I tried just now with throughput as 10 and 100 and got below results. I don't think i had much processes running on machine at that time.

| All | Min Throughput | search | 10.04 | ops/s |
| All | Median Throughput | search | 10.06 | ops/s |
| All | Max Throughput | search | 10.09 | ops/s |

| All | Min Throughput | search | 100.26 | ops/s |
| All | Median Throughput | search | 100.26 | ops/s |
| All | Max Throughput | search | 100.26 | ops/s |


Earlier results -

| All | Min Throughput | search | 11.04 | ops/s |
| All | Median Throughput | search | 11.04 | ops/s |
| All | Max Throughput | search | 11.04 | ops/s |

| All | Min Throughput | search | 105.87 | ops/s |
| All | Median Throughput | search | 105.87 | ops/s |
| All | Max Throughput | search | 105.87 | ops/s |

Here is the output of commands :

  1. uname -a

Linux IP 3.13.0-135-generic #184-Ubuntu SMP Wed Oct 18 11:55:51 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

  1. python3 -c "import sys ; print(sys.implementation)"

namespace(_multiarch='x86_64-linux-gnu', cache_tag='cpython-34', hexversion=50594800, name='cpython', version=sys.version_info(major=3, minor=4, micro=3, releaselevel='final', serial=0))


(Daniel Mitterdorfer) #6

Thanks for the feedback! This might have something to do with the system's timer accuracy. I need to see whether there is a chance I can reproduce this. One last question: Is this a bare-metal machine, running in VM or in the cloud?


#7

No..It's not bare -metal instance.


(Daniel Mitterdorfer) #8

So, is it then running in a VM? Or is it running in a cloud environment (if yes: in which one and ideally also the instance type)? This may help me to reproduce this. Thank you! :slight_smile:


#9

It is running in cloud environment and the EC2 instance type is t2.large on ubuntu 14.04. I am using this instance solely for Rally. Let me know if you need more information :slight_smile:


(Daniel Mitterdorfer) #10

Thanks, that helps! I have raised https://github.com/elastic/rally/issues/393 to track the progress of the respective analysis.


(system) #11

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.