Term query performance issue with respect to size parameter in search query


(Sekhar) #1

Hi,

Due to size parameter, my application performance decreasing completely.

No.of Nodes:1
Operating System : CentoOS 7
No.of Documents in ES : 1Million
No.of fields in Each Document : 15 fields.Each Fields involved three analyzer are phonetic,substring and exact.
Total Documents Size : 1GB
ElasticSearch Server version : 6.0.0

ElasticSearch Heap Memory : 16GB
ElasticSearch Client : Low level Rest Client of Java

BenchMarking reads :
Total no.of Request : 100000

1.Query : {"size":0,"_source":false"query":{"term":{"7":"Sekhar__1000009"}}}

Its processing 5555 request/sec .

2.Query : {"size":1,"_source":false"query":{"term":{"7":"Sekhar__1000009"}}}

Its processing 4166 request/sec .

3.Query : {"size":10000,"_source":false"query":{"term":{"7":"Sekhar__1000009"}}}

Its processing 2033 request/sec .

Note : 1 .Could you please explain why size parameter increasing/decreasing performance based on value.
2.Can you please share benckmarking reading of different match types(exact,word,phonetic,substring,fuzzy) it can very useful for me.
3.Please guide me how to improve the search performance.

Source Code :
package ElasticSearch.ESRestClientSample;

public class RestClientMain {

public static void main(String[] args) {
int count =100000;

String searchJson = {"size":0,"_source":false"query":{"term":{"7":"Subex__1000009"}}};
ElasticSearchRestClient elasticSearchRestClient = new ElasticSearchRestClient();
elasticSearchRestClient.search("14_1024", searchJson, count);
elasticSearchRestClient.closeRestClient();
}
}

package ElasticSearch.ESRestClientSample;

import java.io.File;
import java.io.IOException;
import java.util.Collections;
import java.util.Map;
import org.apache.http.HttpEntity;
import org.apache.http.HttpHost;
import org.apache.http.ParseException;
import org.apache.http.entity.ContentType;
import org.apache.http.nio.entity.NStringEntity;
import org.apache.log4j.Logger;
import org.elasticsearch.client.Response;
import org.elasticsearch.client.RestClient;

public class ElasticSearchRestClient {
private static RestClient restClient;
Map<String, String> emptyMap = Collections.emptyMap();
String httpOrHttps = null;

private static final Logger logger = Logger.getLogger(ElasticSearchRestClient.class);

public ElasticSearchRestClient() {
RestClientBuilderListener restClientBuilderListener = new RestClientBuilderListener();
ElasticSearchRestClient.restClient = RestClient.builder(new HttpHost("ip", port, "http")).setRequestConfigCallback(restClientBuilderListener)
	.setHttpClientConfigCallback(restClientBuilderListener).setMaxRetryTimeoutMillis(10000).build();
}

public void closeRestClient() {
logger.info("Closing ElasticSearchRestClient ");
try {
    restClient.close();
} catch (IOException e) {
    logger.error(
	    "While Closing ElasticSearchRestClient getting Exception Line :", e);
}
}

public void search(String precheckListkey, String searchJson,int count) {
long start= 0, end = 0;
int i = 0;
long total  = 0;

for (; i < count; i++) {
    String currJson = "{\"size\"" + ":0,\"_source\""+":false,\"query\""+":{\"term\""+":{\"7.exactMatch\""+":\"Subex__1"+i+"\""+"}}}";
  
    start = System.currentTimeMillis();
    processRequest("POST",
		File.separatorChar + precheckListkey + File.separatorChar + precheckListkey + "/_search", currJson);
    end = System.currentTimeMillis();
    total += end-start;
}
logger.info("Total Request :"+ i +" Time Diff :" + total + " Speed : "+ i / (total/1000));
}

private void processRequest(String methodType, String url, String json) {
try {
    HttpEntity entity = new NStringEntity(json, ContentType.APPLICATION_JSON);
    Response response = restClient.performRequest(methodType, url, emptyMap, entity);
    //int statusCode = response.getStatusLine().getStatusCode();
   // String message = EntityUtils.toString(response.getEntity());
    //logger.info("Status :"+ statusCode + " Response :"+ message);
} catch (ParseException | IOException e) {
    logger.error("",e);
}
}

}

package ElasticSearch.ESRestClientSample;

import org.apache.http.auth.AuthScope;
import org.apache.http.auth.UsernamePasswordCredentials;
import org.apache.http.client.CredentialsProvider;
import org.apache.http.client.config.RequestConfig.Builder;
import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
import org.apache.log4j.Logger;
import org.elasticsearch.client.RestClientBuilder;
import org.elasticsearch.client.RestClientBuilder.HttpClientConfigCallback;
import org.elasticsearch.client.RestClientBuilder.RequestConfigCallback;

import org.apache.http.impl.nio.reactor.IOReactorConfig;

public class RestClientBuilderListener implements RequestConfigCallback, HttpClientConfigCallback {
private Integer connectTimeOut = RestClientBuilder.DEFAULT_CONNECT_TIMEOUT_MILLIS;
private Integer socketTimeOut = RestClientBuilder.DEFAULT_SOCKET_TIMEOUT_MILLIS;
private static final Logger logger = Logger.getLogger(RestClientBuilderListener.class);

@Override
public Builder customizeRequestConfig(Builder requestConfigBuilder) {
logger.info("customizeRequestConfig :" + requestConfigBuilder.toString());
return requestConfigBuilder.setConnectTimeout(connectTimeOut).setSocketTimeout(socketTimeOut);
}

@Override
public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpClientBuilder) {

httpClientBuilder.setDefaultIOReactorConfig(
                    IOReactorConfig.custom().setIoThreadCount(1).build());
return httpClientBuilder;
}

}


(David Pilato) #2

Please format your code, logs or configuration files using </> icon as explained in this guide and not the citation button. It will make your post more readable.

Or use markdown style like:

```
CODE
```

This is the icon to use if you are not using markdown format:

There's a live preview panel for exactly this reasons.

Lots of people read these forums, and many of them will simply skip over a post that is difficult to read, because it's just too large an investment of their time to try and follow a wall of badly formatted text.
If your goal is to get an answer to your questions, it's in your interest to make it as easy to read and understand as possible.
Please update your post.

Note : 1 .Could you please explain why size parameter increasing/decreasing performance based on value.

Because when setting size to 1000, each shard has to send 1000 documents to the coordinating node. Assuming you are using 5 shards, it means 5000 documents over the network then the coordinating node is merging the documents and send back to the client 1000 documents over the network.

That is probably why it is slower.

Please guide me how to improve the search performance.

Decrease the number of shards, keep size to 10... Increase the number of nodes and replicas... Many options but why 5555 request/sec looks a bad result to you?


(Sekhar) #3

Source Code :

package ElasticSearch.ESRestClientSample;

import java.io.File;
import java.io.IOException;
import java.util.Collections;
import java.util.Map;
import java.util.Random;

import org.apache.http.HttpEntity;
import org.apache.http.HttpHost;
import org.apache.http.ParseException;
import org.apache.http.client.config.RequestConfig;
import org.apache.http.entity.ContentType;
import org.apache.http.impl.nio.client.HttpAsyncClientBuilder;
import org.apache.http.impl.nio.reactor.IOReactorConfig;
import org.apache.http.nio.entity.NStringEntity;
import org.elasticsearch.client.Response;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestClientBuilder;

public class Test1 {
    private static RestClient restClient;
    Map<String, String> emptyMap = Collections.emptyMap();
    String httpOrHttps = null;

    public Test1() {
	
	Test1.restClient = RestClient.builder(new HttpHost("10.113.56.205", 9222, "http")).setRequestConfigCallback(new RestClientBuilder.RequestConfigCallback() {
	    @Override
	    public RequestConfig.Builder customizeRequestConfig(RequestConfig.Builder requestConfigBuilder) {
		
		requestConfigBuilder.setConnectTimeout(1000);
		return requestConfigBuilder.setSocketTimeout(30000); 
	    }
	}).setHttpClientConfigCallback(new RestClientBuilder.HttpClientConfigCallback() {
	    @Override
	    public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpClientBuilder) {
		return httpClientBuilder.setDefaultIOReactorConfig(
                        IOReactorConfig.custom().setIoThreadCount(1).build()); 
	    }
	}).build();
    }

    public void closeRestClient() {
	System.out.println("Closing ElasticSearchRestClient ");
	try {
	    restClient.close();
	} catch (IOException e) {
	    e.printStackTrace();
	    System.err.println("While Closing ElasticSearchRestClient getting Exception Line :");
	}
    }

    public void search(String precheckListkey, int count) {
	long start = 0, end = 0;
	int i = 0;
	long total = 0;
	Random rand = new Random();

	for (; i < count; i++) {
	    String currJson = "{\"size\"" + ":0,\"_source\"" + ":false,\"query\"" + ":{\"term\"" + ":{\"7.exactMatch\""
		    + ":\"Subex__1" + rand.nextInt((100000 - 0) + 1) + "\"" + "}}}";

	    start = System.currentTimeMillis();
	    processRequest("POST",
		    File.separatorChar + precheckListkey + File.separatorChar + precheckListkey + "/_search", currJson);
	    end = System.currentTimeMillis();
	    total += end - start;
	}

	System.out.println("Total Request :" + i + " Time Diff :" + total + " Speed : " + i / (total / 1000));

    }

    private void processRequest(String methodType, String url, String json) {
	try {
	    HttpEntity entity = new NStringEntity(json, ContentType.APPLICATION_JSON);
	    Response response = restClient.performRequest(methodType, url, emptyMap, entity);
	    // int statusCode = response.getStatusLine().getStatusCode();
	    // String message = EntityUtils.toString(response.getEntity());
	    // logger.info("Status :"+ statusCode + " Response :"+ message);
	} catch (ParseException | IOException e) {
	    e.printStackTrace();
	}
    }
    
    public static void main(String[] args) {
	Test1 test = new Test1();
	test.search("14_1024",10000);
	test.closeRestClient();
    }
}

I need to know single node how many maximum requests can I process.Please guide as soon as possible


(Sekhar) #4

Main Points : 1.Both ES Client and ES Server on same machine.I need to know how many request can process per second with respect different match types(Exact,Word,Fuzzy,Phonetic and SubString).
2. I checked creating 1,2,3,5 shards and 0,1 replicas of index but no performance improve.
2. I need to know single node how many maximum requests per second can I process.


(David Pilato) #5

This is not needed. Read this and specifically the "Also be patient" part.

I need to know single node how many maximum requests can I process.

It depends... I'd encourage you looking at Rally project to run your benchmarks.

If you have only one node, then that won't help to change the number of replicas. Note that doing a benchmark with one node if you have in production 10 nodes does not make sense to me. You can't extract any valid number from your test.
You should test against a similar platform that you will have in production


(Sekhar) #6

Thanks for the help so far !.

I have created Index with 1 shard and 1 replicas.Below are the readings observed :

(Note : My current intention is to understand what is the max single client request thread speed can be observed with simple term query , with single node/shard/replica , then add other complex matches , play around with nodes/shards etc and see the impact).

Total no.of Request : 100000

1.Query : {"size":0,"_source":false"query":{"term":{"7":"Sekhar__1000009"}}}

Its processing 5555 request/sec .

2.Query : {"size":1,"_source":false"query":{"term":{"7":"Sekhar__1000009"}}}

Its processing 4166 request/sec .

3.Query : {"size":10000,"_source":false"query":{"term":{"7":"Sekhar__1000009"}}}

Its processing 2033 request/sec .

To be noted , my query matches only one document, hence with size 10000 as well, only one document ES server need to work upon.

**My straight question is : **
1. with size 0 : 5555 req/s
2. with size 1 : 4166 req/s
3. with size 10k : 2033 req/s.

From size 1 to size 10k , speed is reducing less than 50% even though my query is returning only one document.

One of the ElasticSearch blog (below) specified term query giving 7500 to 10000 records/sec on single node with single shard and size 0 is used in this


(David Pilato) #7

It depends on the machine, the data, the mapping I guess....


(Sekhar) #8

Thanks for the help so far !..

In Blog specified Heap memory as 1GB and Index Size 124GB but my heap memory as 16GB and Index size as 1GB.

Based parameter it should better performance right ?


(David Pilato) #9

No. It depends also on the hardware. Like number of cores, frequency of the CPU, SSD drives settings, ... So many factors are involved.
Unless you are using the exact same hardware, with the exact same version in the exact same conditions with the exact same scenario/tools, I don't think you can compare.

But anyway. What is the problem you are trying to solve? Is 5555 req/s not good for your use case?


(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.