Elasticsearch-java query slowly

We have an es cluster, a single node, the version is 8.5.3, and then the java program is linked to do the query. The problem now is that the response speed is within 10ms when we directly curl the query on the host where the java program is located, but the average response time is about 45ms when using the java program to query. The client of the java link to es uses the co.elastic.clients.elasticsearch.ElasticsearchClient provided in elasticsearch-java-8.5.3.jar, and the es link pool uses common-pool2 of apache. This problem does not exist on red hat7, but it will appear when the java program is deployed on red hat6

When you say "the average response time is about 45ms", do you mean the took field in the SearchResponse object?


You should upgrade as the client is moving fast. Use the 8.8.2 version and 8.9.0 when this one will be on Maven Central.

Are you using also the exact same JVM version on both systems? Is there any hardware difference?

Yes, it is the value of taken, with an average of 45ms, but my curl is very fast. I also have this problem with 8.8.2. Another question is why after I created 20 ElasticsearchClients, there are 20*n I/O Dispatcher threads in the jvm, where n is the number of cores of my cpu

You should not do that. Instead have only one client instance for the whole JVM (singleton).

I found this problem and made a modification, I only created a restClient, and then packaged it into a different ElasticsearchClient. The number of I/O Dispatcher threads returned to normal. But the query is still very slow. The A index in the slow es cluster has only 30 million pieces of data, but the average query takes 45ms. The A index in the fast es cluster has 700 million pieces of data, and the average query only takes 10ms. There are only two differences between the two clusters. One is that the CPU of the slow cluster is 32-core 2.1Ghz, and the fast cluster is 24-core 2.4Ghz. Another difference is that the A index in the slow index has 3 primary shards, and the A index in the fast index has only 1 primary shard. I am going to turn the fast index into three primary shards for testing. Do you have any other suggestions? Many thanks

It is important that you compare apples to apples, so having the same number of primary shards is vital.

Do the two indices contain the same type of data? What is the size of the two indices?

Dothe two clusters have the same specification apart from the CPU type (RAM, heap size, type and size of storage, node count etc)?

Do the two clusters hold the same amount of data in total apart from this index that is different? If not, what is the difference?

Are the two clusters under the same amount of load across all indices held?

I think I can give the conclusion of similar comparison. I created a new index in the slower es cluster by means of reindex. The only difference from the old index is that the new index has only one primary shard , the query speed has increased from about 45ms to 10ms, why? In addition, I want to know that a certain index has a large number of queries, the index data volume is about 1 billion, and the number of queries is about 2 billion times a day. Is there any good suggestion or related information that can be consulted. Many thanks

Did the latency decrease to 10ms with only a single primary shard on the slower environment?

If that is the case there could be a number of reasons.

  • Maybe the query executes fast in a single thread against a single shard and addding parallelism by having a larger number of additional primary shards add significant overhead in terms or communication and coordinaion.
  • Having 3 primary shards could result in more disk reads compared to querying a single shard. If you have slow or saturated storage this could make a difference.
  • If shards are distributed across the cluster, querying multiple shards is likely to hit more nodes. Limitations on network speed or potentially high load on any nodes in the cluster could in this case have an impact .

I'm running in a single node cluster, there should be no coordination and communication overhead. According to your description, the rest may be on the storage device. I checked my disk health and the device utilization is not high. Is there any other possibility? By the way, the size of the fragment has exceeded 100G after changing to a single fragment

What type of disk/storage are you using? Given the relatively small difference in latency I would not necessarily expect the disk to need to be saturated to cause such a delay, especially if you are not using a local SSD.

Mechanical hard disk, that is to say, do you still think the difference is caused by storage devices?

I can at this point not see any other reason as you have a single node. More shards likely result in more reads, which can be slow and add latency on HDDs. I would recommend performing the same test/comparison on a node backed by local SSD.

Adding to this that reading from one shard only does not require gathering the responses from 2 other shards and merge all that, sort, and build the response.
That could also explain an overhead.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.