I am working on porting a search engine from an sql database to
elasticsearch. The main reason for doing it is to be able to compute
facets easily.
Currently we have facets on the sql by generating precalc tables. It
works well but it's a pain to maintain and facets are supported only
on a subset of the datas.
Now the ES prototype is working, i am benchmarking the two solutions,
and it apears that the ES version is a little under the sql version in
terms of performance (in terms of maintainability it's far better).
I've used the exact same machine configutations, a 64 bits platform,
32 gigs of ram, an ssd disk, and a quad core Intel Xeon at 3ghz to
compare the sql and ES.
The document is not small, there's around 200 fields, depending on the
request, script based sorting is used, and facets are always computed
on 8 fields of the doc.
The index contains 3 millions of docs, if i'm not mistaken it's
relatively small to what ES can handle.
In terms of query, i use a filtered query, and for some requests, a
custom_filters_score query to compute the score and use it for
sorting.
Some of the filters are global because of the facets but there's
always some filters in the filtered query, so the number of docs
scanned should be reduced (not all the index is scanned).
I use two measures in my tests : the time spent on the server to
execute the search, and the number of queries by second executed by
the client running 100 threads in parallel.
For elasticsearch, the average time spent on the server is around 500
ms for each query (for 100 queries in parrallel), and the average
queries by second on the client is around 160 (some ms are lost in
building the query, sending it, receiving results, and parsing them).
And this is with an index having 1 shard and 0 replicas, when i
increase the number of shards/replicas, performances drop
significantly.
For sql, the average time spent to execute a query is around 360 ms
(idem, with 100 queries running in parrallel), and the average queris
by second on the client is around 200.
I know it's hard to compare, but as i don't have any idea of the
results i can expect, i wonder if someone can comment on these
measures.
Maybe i missed something and it should be an order of magnitude
faster, or maybe these are the typical results for similar
environnements to mine, i don't know.
What can i expect in my case ? What did you observe under similar
circumstances with ES ? Does it support concurrent requests well ?
Should the time to execute a query be in the range of 500 ms when
making 100 queries at the same time ? Are there some ways to improve
search performances ?
Any information or comments are welcome, this is an important part for
the decision to industrialize the prototype or not.
Thank you.