Should I use elasticsearch as a core for faceted navigation-heavy website?

Hello,

I'm evaluating elasticsearch for use in new project. I can see that it has
all search features we need. Problem is that after reading documentation
and forum I still can't understand whether elastic is suitable technology
for us performance-wise. I'd be very grateful to get your opinion on that.

We're building a directory of businesses, similar to Yelp. We have 5m
businesses, and main feature of our site is faceted search on different
facets: geography (tens of thousands geo objects), business type (several
thousand options), additional services offered by that business (hundreds)
and so on. So for each request (combination of search parameters) we need
to get search results, but also what options are available in each facet
(for example, what business types are located are in selected geography)
for user to be able to narrow down his search (example:
http://take.ms/oAZan). Full text search (by business name for example) is
used in very small percentage of requests, bulk of requests is exact match
on one or several facets.

Based on the similar our project we expect 1-5m requests per day. All
requests are highly diversified: no single page (combination of search
params) constitutes more than 0,1% of total requests. We expect to be able
to answer request in 200-300ms, so I guess request to elasticsearch should
take no more than 100ms.

On our similar project we use big lookup table in database with all
possible combinations of params mapped to search result count. For each
request we generate all possible combinations of parameters to refine
current search and then check lookup table to see if they have any results.

My questions are:

Is elastic search suitable for our purposes? Specifically, are aggregations
meant to be used in large number of low-latency requests, or are they more
like analytical feature, where response time is not that important? I ask
that because in discussions of aggregation and faceting performance here
and elsewhere response times are mentioned in 1-10s range, which is ok for
analytics and infrequent searches, but obviously on ok for us.

How hard it is to get performance we need: 50 rps, 100ms response time for
search+facets, on some reasonable hardware, taking into account big number
of possible facet combinations and high diversification of requests? What
kind of hardware should we expect to handle our loads? I understand that
these are vague questions, but I just need some approximation. Is it more
like 1 server with commodity hardware and simple configuration, or more
like cloud of 10 servers and extensive tuning? For example, our lookup
table solution works on 1 commodity server with 16gb of ram with almost
default setup.

Thank you for your responses,
Dmitry

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/807801d4-2926-4e52-9161-dc82d3f33a75%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Short answer: yes. With properly sharded and scaled out environment, and
using ES 1.4 or newer, you should be able to get those numbers.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Lucene.NET committer and PMC member

On Tue, Mar 24, 2015 at 5:38 PM, Dmitry dmitry.bitman@gmail.com wrote:

Hello,

I'm evaluating elasticsearch for use in new project. I can see that it has
all search features we need. Problem is that after reading documentation
and forum I still can't understand whether elastic is suitable technology
for us performance-wise. I'd be very grateful to get your opinion on that.

We're building a directory of businesses, similar to Yelp. We have 5m
businesses, and main feature of our site is faceted search on different
facets: geography (tens of thousands geo objects), business type (several
thousand options), additional services offered by that business (hundreds)
and so on. So for each request (combination of search parameters) we need
to get search results, but also what options are available in each facet
(for example, what business types are located are in selected geography)
for user to be able to narrow down his search (example:
http://take.ms/oAZan). Full text search (by business name for example) is
used in very small percentage of requests, bulk of requests is exact match
on one or several facets.

Based on the similar our project we expect 1-5m requests per day. All
requests are highly diversified: no single page (combination of search
params) constitutes more than 0,1% of total requests. We expect to be able
to answer request in 200-300ms, so I guess request to elasticsearch should
take no more than 100ms.

On our similar project we use big lookup table in database with all
possible combinations of params mapped to search result count. For each
request we generate all possible combinations of parameters to refine
current search and then check lookup table to see if they have any results.

My questions are:

Is Elasticsearch suitable for our purposes? Specifically, are
aggregations meant to be used in large number of low-latency requests, or
are they more like analytical feature, where response time is not that
important? I ask that because in discussions of aggregation and faceting
performance here and elsewhere response times are mentioned in 1-10s range,
which is ok for analytics and infrequent searches, but obviously on ok for
us.

How hard it is to get performance we need: 50 rps, 100ms response time for
search+facets, on some reasonable hardware, taking into account big number
of possible facet combinations and high diversification of requests? What
kind of hardware should we expect to handle our loads? I understand that
these are vague questions, but I just need some approximation. Is it more
like 1 server with commodity hardware and simple configuration, or more
like cloud of 10 servers and extensive tuning? For example, our lookup
table solution works on 1 commodity server with 16gb of ram with almost
default setup.

Thank you for your responses,
Dmitry

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/807801d4-2926-4e52-9161-dc82d3f33a75%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/807801d4-2926-4e52-9161-dc82d3f33a75%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsOOMoJUDmdcH-zsL4EHymR6SUw6T3dKSs_UHfVq0mtCw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.