We set up an environment to evaluate Elasticsearch's network performance. We found that Elasticsearch's network performance is much slower than GRPC's. Here are the key results:
Elasticsearch is a search engine, not a key-value store. It can be used as such, but is not optimised for that use case. I would therefore suspect the main difference in latency is related to request processing and retrieval of the data rather than network performance.
What is the rationale behind this benchmark? What is your use-case?
The main difference should not be query processing. First, the posting lists are very short (about one posting in each queried posting list). Scoring and sorting should take no time. Second, in fact, our GRPC program also does the same query processing.
The rationale behind this benchmark is that network performance is critical for applications using client-server model, such as elasticsearch. Short posting lists, which are evaluated in the benchmark, are quite common. 90% of the terms in wikipedia and reddit have very short posting lists. For queries to these terms, network overhead dominates.
I was just wondering what the expected performance of Elasticsearch's network component? Have you guys done any particular benchmarking?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.