_msearch capability

I would like to know if /_msearch operation will be supported in the near future or is available in the latest release?

I have already implemented a custom parameter source using random terms from a file for queries using _search.

We now have a requirement to benchmark a collection of queries in the same request that have the query term paramatised.
I would appreciate an example of a custom runner that would be able parse the type of search request shown below. Any help would be greatly appreciated.

Thanks

POST <index>/_msearch
Content-Type: application/x-mdjson:charset=UTF-8

{“query”: { “match_phrase”: { “report.meta.type1”: { “query”: “<random-term-type1>” }}}}
{“query”: { “match_phrase”: { “report.meta.type2”: { “query”: “<random-term-type2>” }}}}

Hi,

At the moment we are constantly adding new operations to Rally core. The focus in the next time will be rather on administrative API calls though because I want users to give more flexibility how they can setup their indices with Rally.

As you have already noted, you need two components to be able to benchmark _msearch:

  • A custom parameter source which provides the necessary data
  • A custom runner, which calls the _msearch API

I think the runner can be rather simple:

def msearch(es, params):
    response = es.msearch(body=params["body"])
    # you can tell Rally that Elasticsearch has now actually executed more 
    # than one operation but that is up to you how you want to measure this.
    # The default is to assume that one operation got executed per call.
    return len(response["responses"]), "ops"

(see also the Python client's msearch API).

The parameter source is actually a bit trickier and it depends a bit on your concrete requirements. In the most complex case you may want to read all queries from a large file and have the clients read and execute them. This is very similar to what we do with the bulk API.

If you can get away with having a subset of queries that you can create more or less on the fly, then the parameter source can be much simpler. In the simplest case you could just create an array of queries:

def simple_msearch_param_source(indices, params):
    return {
        "body": [
            {"query": {"match_phrase": {"report.meta.type1": {"query": "<random-term-type1>"}}}},
            {"query": {"match_phrase": {"report.meta.type2": {"query": "<random-term-type2>"}}}},
            {"query": {"match_phrase": {"report.meta.type3": {"query": "<random-term-type3>"}}}}
        ]
    }

However, you will very likely want to customize the queries a bit, so you could read a file with some query parameters at startup and then randomly choose some. You can see an example in our standard tracks that you could adapt to your needs.

I hope that helps you get started and I'd be also curious how you want to use it. This helps us to better understand whether it makes sense to include this operation in some future release.

Daniel

Daniel,

Thanks for the clear response with examples. It pretty much matches what I came up with so glad to have my thinking/working confirmed. I wi'll be happy to let you know what the final setup looks like.

The current blocking issue I'm having with _msearch is with the content-type on the request being set as 'application/json' and not 'application/x-ndjson'. Looking at elasticsearch-py this should be set correctly for _msearch. I wasn't able to confirm exactly what version of elasticsearch-py is being used in Rally 0.8.0.

Can you confirm if this works okay for you, and if not how I could set/override the content-type on a request from Rally?

Thanks,

Andrew

At the moment, Rally uses version 5.5.0 of the Elasticsearch Python client. In the most recent released version (6.0.0), the PR #618 has been included which should set the header correctly for you. In my local tests, I could not confirm this behaviour (i.e. the content-type header is still application/json) and I'm in contact with the project team.

Unfortunately, it is also not easily possible to override the content-type header from the outside at the moment. Once I know more, I'll update the thread here.

I did some more testing meanwhile. While the content-type header is not set to application/x-ndjson, Elasticsearch will still handle the msearch request correctly (I tested this against Elasticsearch 6.0.0).

Here is a minimal standalone example:

import json

es = elasticsearch.Elasticsearch()

response = es.msearch(body=[
    {},
    {"query": {"match_all": {}}},
    {},
    {"query": {"match_all": {}}}
])

print(json.dumps(response, indent=2))

The important part (that was unfortunately also missing in my example earlier) is to include a header line before each query (similarly to the bulk API); see also the Elasticsearch docs of the msearch API.

If you include that you should be able to run msearch queries just fine although the content-type header is application/json. I hope that resolves your blocker.

Thanks for confirming that msearch via Rally works with either content-type application/json or application/x-ndjson against Elasticsearch 6.0.0.

Our implementation of elasticsearch places a proxy between the client (in this case Rally) and Elastic 6.0.0. This proxy has been developed to parse, check multi line json with content-type application/x-ndjson and check plain json with content-type application/x-ndjson, failing if it receives any other combination.

As a workaround to allow us to develop our custom Rally tracks we have relaxed the content-type on requests but this will introduce a performance hit, likely skewing our results a little. We also don’t want to
leave this as a viable option for clients when we go live.

Our objective with using Rally was to measure the impact and performance of our proxy solution against native elasticsearch.

Would you be able to estimate a date to which Rally will move to using the elasticsearch-py 6.0.0 or a patch on the existing version to set the content-type correctly for _msearch and _bulk operations? We will then be able to remove our fix from our build!

Thank you for assistance to date, it’s been really helpful!

Regards,

Andrew

Thanks for the background info.

The problem is not so much Rally but that the Elasticsearch Python client does not pass the content type when issuing the request. We just need to upgrade the Python client in Rally once that problem is fixed and then everything should work as intended without any further work in Rally.

I've created a PR upstream that fixes this issue. Once that is released, Rally can upgrade the Elasticsearch Python client and msearch will set the correct content-type header.

Thanks Daniel, will keep an eye out for the release.

Regards,

Andrew

My PR has been merged now and will be included in the next release of the Elasticsearch Python client. When it is out, we will upgrade the client in Rally and it will then be possible to explicitly set the content-type header in custom runners.

That's great, thanks for letting me know. Will keep an eye out for the new release.

Cheers,

Andrew

The 6.1.1 version of the client is out which fixes the header issue. However, with that version the classic SSL support does not work at the moment (see https://github.com/elastic/elasticsearch-py/issues/712). There is a new approach to implement SSL which we will implement in Rally eventually.

As the upcoming 0.9.0 has enough changes already, I'd rather have this change in a separate release. I have raised https://github.com/elastic/rally/issues/395 for the upgrade which is targeted for Rally 0.9.1.

Thanks for the update, just seen Rally 0.9.0 release and was looking through the release note for the change. Will now just wait for 0.9.1 with the fix i'm after.

I just attempted to upgrade to 6.1.1 but SSL support has some issues in the new client version so I fear we'll postpone the upgrade a bit. Btw, I have also released 0.9.1 today but this contains only a critical fix for 0.9.0. The _msearch fix is targeted for Rally 0.9.2 at the moment but it really depends on when the SSL issues are resolved upstream.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.