Why is Elasticsearch network performance much slower than GRPC?

Hi all,

We set up an environment to evaluate Elasticsearch's network performance. We found that Elasticsearch's network performance is much slower than GRPC's. Here are the key results:

QPS(Elastic) : 5,000
QPS(GRPC) : 80,000
50th% Latency (Elastic): 23ms
50th% Latency (GRPC ): 2ms

We have set up the experiments in a way to minimize other overhead, so that network performance dominate:

  • All posting lists are very short (most of them have 1 posting only)
  • All files are loaded to page cache before querying.
  • Only one term in each query

To be fair,

  • The GPRC client and server send/receive the similar amount of data as Elasticsearch does.
  • Basically, both Elasticsearch and the GRPC program act as a remote key-value store.

Some other notes:

  • There are two nodes. The client is on one; the server is on another.
  • 16 (logical) cores, 25 server threads, 128 client threads.
  • GRPC is in async + streaming mode.
  • the GPRC test program is implemented in C++.

My questions are:

  1. Is Elasticsearch's network expected to be much slower than GRPC (the state-of-the-art network lib)?
  2. Any suggestions to improve Elasticsearch's network performance? What's the expectation?
1 Like

Elasticsearch is a search engine, not a key-value store. It can be used as such, but is not optimised for that use case. I would therefore suspect the main difference in latency is related to request processing and retrieval of the data rather than network performance.

What is the rationale behind this benchmark? What is your use-case?

Hi Christian,

Thanks for the reply.

The main difference should not be query processing. First, the posting lists are very short (about one posting in each queried posting list). Scoring and sorting should take no time. Second, in fact, our GRPC program also does the same query processing.

The rationale behind this benchmark is that network performance is critical for applications using client-server model, such as elasticsearch. Short posting lists, which are evaluated in the benchmark, are quite common. 90% of the terms in wikipedia and reddit have very short posting lists. For queries to these terms, network overhead dominates.

I was just wondering what the expected performance of Elasticsearch's network component? Have you guys done any particular benchmarking?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.