Hi all,
We set up an environment to evaluate Elasticsearch's network performance. We found that Elasticsearch's network performance is much slower than GRPC's. Here are the key results:
QPS(Elastic) : 5,000
QPS(GRPC) : 80,000
50th% Latency (Elastic): 23ms
50th% Latency (GRPC ): 2ms
We have set up the experiments in a way to minimize other overhead, so that network performance dominate:
- All posting lists are very short (most of them have 1 posting only)
- All files are loaded to page cache before querying.
- Only one term in each query
To be fair,
- The GPRC client and server send/receive the similar amount of data as Elasticsearch does.
- Basically, both Elasticsearch and the GRPC program act as a remote key-value store.
Some other notes:
- There are two nodes. The client is on one; the server is on another.
- 16 (logical) cores, 25 server threads, 128 client threads.
- GRPC is in async + streaming mode.
- the GPRC test program is implemented in C++.
My questions are:
- Is Elasticsearch's network expected to be much slower than GRPC (the state-of-the-art network lib)?
- Any suggestions to improve Elasticsearch's network performance? What's the expectation?