Hi!
we're using elasticsearch for an open source geocoder called photon. We're
using solr previously but we switched to elasticsearch some time ago and
I'am using now multi_match's cross_field
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#type-cross-fields
query (which is great by the way as it sorts out most problems we had
before).
I investigated the performance between both implementation and it turned
out that the elasticsearch is about 5 times slower than the solr
counterpart. The dataset (100,000,000 documents) is identical and the size
of both indices too. On the solr side, I am using an edismax
https://github.com/komoot/photon/blob/deprecated-solr-version/solrconfig/collection1/conf/solrconfig.xml#L122
query whilst it is a cross_field
https://github.com/christophlingg/photon/blob/komoot/website/photon/app.py#L25 on
elasticsearch. Average query time is 120ms vs. 1000s.
I adjusted the number of open file descriptors to 64k, during the benchmark
there is (almost) no IO whilst the cpu is very high (> 75%, 12 cores). As
cross_field is a very recent feature I tried out best_field
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-multi-match-query.html#type-best-fields as
well, but benchmark results weren't better.
Do you have any ideas on how I can dig more into performance issues like
this in elasticsearch? Do you have experience with both queries you can
share with me?
Thanks for your help!
Christoph
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5bff0274-ea12-4f28-a304-3f0ad691880c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.