Rest High level client performance


(Gzeskas) #1

Hi

I'm doing some research how to utilize elasticsearch cluster in best way.
I have written simple java application that is using Rest High level client and setup of 2 elasticsearch nodes with 1 index that contains 2 shards and 0 replicas.

The problems that I encountered is that performance is not increasing with second node as I had around 7000 index request per second with 1 node and 1 shard the same performance I receive with 2 nodes and 2 shards with 0 replicas.

The code that it's quite simple:

Flux.range(1, 1000000000)
    .flatMap(this::createNewIndexRequest)
    .flatMap(this::index)
    .subscribe(next -> {
        completed.addAndGet(batchSize);
    }, error -> {
        logger.error("Received error: ", error);
    }, () -> {
        logger.info("Completed");
    });

Method that is performing send operation

    private Mono<IndexResponse> index(IndexRequest indexRequest) {
        return Mono.create(monoSink -> {
            client.indexAsync(indexRequest, RequestOptions.DEFAULT, new ActionListener<IndexResponse>() {
                @Override
                public void onResponse(IndexResponse indexResponse) {
                    monoSink.success(indexResponse);
                }

                @Override
                public void onFailure(Exception e) {
                    logger.error("Encounter problem during document indexing:", e);
                    monoSink.error(e);
                }
            });
        });
    }

Rest High level client is builded like this:

    public RestClientBuilder restClientBuilder() {
        HttpHost[] hosts = this.properties.getUris().stream().map(HttpHost::create)
                .toArray(HttpHost[]::new);

        RestClientBuilder builder = RestClient.builder(hosts);
        builder.setHttpClientConfigCallback(b -> b.setDefaultHeaders(
                Collections.singleton(new BasicHeader(HttpHeaders.ACCEPT_ENCODING, "gzip"))))
                .setRequestConfigCallback(b -> b.setContentCompressionEnabled(true));

        builder.setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
            @Override
            public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpClientBuilder) {
                httpClientBuilder.setMaxConnPerRoute(1000);
                httpClientBuilder.setMaxConnTotal(10000);
                return httpClientBuilder;
            }
        });
        return builder;
    }

Client is configured to use those 2 nodes addresses.

What I'm missing ? Maybe someone could point me where I should look for the information ?


(David Pilato) #2

First thing you should do is to use the BULK API: https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-document-bulk.html

Specifically in Java I'd recommend using the Bulk Processor: https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-document-bulk.html#java-rest-high-document-bulk-processor


(Gzeskas) #3

In my case it's impossible to use Bulk API, I have a case where it's needed to process message by message and index them.

Me previous numbers were incorrect as I was bound to CPU on local machine, when I tried the same on cloud machine with more CPU i received results that 1 node could handle around 16k index requests but after adding second node and second shard throughput only jumped to around 21.5k.

So i'm trying to figure out why I'm not able to receive lineare scalability ?


(David Pilato) #4

Why that? What is the difference an indexAsync operation?