Performance of Rest client vs Transport client

Hi
We have a cluster running Elasticsearch 6.5.3 and we are using the same version for the java client library. Since we replaced the transport client in favor of the high level rest client, we are experiencing a major decrease in performance, the response times from the cluster have increased by a factor of 2. We see no increased search/index times on the elastic nodes.
Any idea of why we're experiencing this performance hit with the rest client?

EDIT:
Screenshot from VisalVM

Regards
Johan

Could it be related to this issue that has been reported recently?

Hi
I don’t know if it may be related. I doubt string interning would be an issue in our case because we execute rather simple queries without almost any parameters @dadoonet
edited my post with a VisualVM screenshot

What if you run the same query using curl or the Kibana dev console?
What is the took value when you use the Rest client vs using curl vs using the Transport Client?

The numbers presented here is not statistically significant but merely individual test results but hopefully they will give some insight.

Rest client: Took value was 1ms and time to invoke RestHighLevelClient::search was 195ms
Transport client: Took value was 1ms and time to invoke TransportClient::search was 43ms
Kibana dev console: Took value was 1ms

Is your client always initialized or is this including the time to build the client?
Could you share your code?

Can you run the same query with:

time curl localhost:9200/_search/... 

Can you share the full query as well?

The results from the time curl call:

{"took":22,"timed_out":false,"_shards":{"total":20,"successful":20,"skipped":0,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}
real	0m0.046s
user	0m0.007s
sys	    0m0.006s

This code that I was testing was a test reproducing production settings:

private void testHighLevelRestClient() throws IOException {
        System.setProperty("environment", "stage");
        final ServiceLocator serviceLocator = ServiceLocatorFactory.createServiceLocator("consul.service.consul", 8500, 5, TimeUnit.SECONDS);

    RestHighLevelClient restHighLevelClient = restHighLevelClient(serviceLocator);

    MatchRequest matchRequest = MatchRequest.builder()
        .name("Apple")
        .requestId("12")
        .gtin14("34234234")
        .offerId(OfferId.valueOf(1, "sku"))
        .build();
    SearchRequest request = new SearchRequest(EsIndex.PRODUCT.alias()).types(EsIndex.PRODUCT.type())
        .source(new SearchSourceBuilder().query(buildSimpleQuery(matchRequest)));

    StopWatch stopWatch = new StopWatch();
    stopWatch.start();

    //Do request
    TimeValue took = restHighLevelClient.search(request, RequestOptions.DEFAULT).getTook();

    stopWatch.stop();
    System.out.println("Rest client: " + stopWatch.getTotalTimeMillis());
    System.out.println("Rest client took: " + took.getStringRep());
}

private BoolQueryBuilder buildSimpleQuery(final MatchRequest request) {
    final BoolQueryBuilder query = boolQuery();
    if (request.getIsbn10().isPresent()) {
        query.should(termQuery(ElasticSearchFieldConstants.PRODUCT_INDEX_ISBN_10, request.getIsbn10().get()));
    }
    if (request.getGtin14().isPresent()) {
        query.should(termQuery(ElasticSearchFieldConstants.PRODUCT_INDEX_GTIN_14, request.getGtin14().get()));
    }

    query.mustNot(termQuery(ElasticSearchFieldConstants.PRODUCT_INDEX_NEGATIVE_MATCHES_FIELD_NAME, request.getOfferId().toString()));
    return query;
}

private RestHighLevelClient restHighLevelClient(final ServiceLocator serviceLocator) {

    final List<Service> esServices = serviceLocator.getServices("match-es-http");
    final HttpHost[] hosts = esServices.stream()
        .filter(service -> service.getPort().isPresent())
        .map(service -> new HttpHost(service.getAddress(), service.getPort().get()))
        .toArray(HttpHost[]::new);
    return new RestHighLevelClient(RestClient.builder(hosts));
}

As you notice, my stop watch only surrounds the actual search request.

The resulting query looks like

{
  "query": {
    "bool": {
      "must_not": [
        {
          "term": {
            "negativeMatchedOfferIds": {
              "value": "1_sku",
              "boost": 1
            }
          }
        }
      ],
      "should": [
        {
          "term": {
            "gtin14": {
              "value": "34234234",
              "boost": 1
            }
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1
    }
  }
}

Is it the same query that you ran with curl?
The one with curl returns 0 result.

Also what if you run this in a loop:

    StopWatch stopWatch = new StopWatch();
    stopWatch.start();

    //Do request
    TimeValue took = restHighLevelClient.search(request, RequestOptions.DEFAULT).getTook();

    stopWatch.stop();
    System.out.println("Rest client: " + stopWatch.getTotalTimeMillis());
    System.out.println("Rest client took: " + took.getStringRep());

Does it look better after some calls?

Yes, it is the same query that I ran with curl. I ran the code in a loop of 20 times and I got a seemingly haphazard behavior:

Rest client: 312
Rest client took: 139ms
------------------------------
Rest client: 31
Rest client took: 22ms
------------------------------
Rest client: 86
Rest client took: 77ms
------------------------------
Rest client: 54
Rest client took: 43ms
------------------------------
Rest client: 11
Rest client took: 3ms
------------------------------
Rest client: 19
Rest client took: 8ms
------------------------------
Rest client: 14
Rest client took: 7ms
------------------------------
Rest client: 12
Rest client took: 6ms
------------------------------
Rest client: 14
Rest client took: 6ms
------------------------------
Rest client: 12
Rest client took: 5ms
------------------------------
Rest client: 71
Rest client took: 57ms
------------------------------
Rest client: 60
Rest client took: 51ms
------------------------------
Rest client: 26
Rest client took: 8ms
------------------------------
Rest client: 23
Rest client took: 15ms
------------------------------
Rest client: 16
Rest client took: 7ms
------------------------------
Rest client: 156
Rest client took: 146ms
------------------------------
Rest client: 45
Rest client took: 34ms
------------------------------
Rest client: 25
Rest client took: 19ms
------------------------------
Rest client: 16
Rest client took: 9ms
------------------------------
Rest client: 9
Rest client took: 4ms
------------------------------

We noticed the problem when we compared the different clients in our production vs stage environment. Chart number 1 is a recording from our stage environment with the rest client and chart number 2 is a recording of our production environment that is still running the transport client. Btw, we are running multiple clusters and this behaviour manifest itself on all of them)


I wonder if it could be better if you replace in your query should by filter. Just to see if you have consistent results.
Also are you indexing and querying at the same time?
In which case I wonder if this pattern occurs when new segments are created on disk or when merge is happening...

I'm running out of ideas TBH.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.