following on from the "who's in production with elastic search" I thought it
may be worth also having a "who's evaluating / incorporating elastic search"
email thread. I'll go first!
Here at the Online Technology Group, within Sony Computer Entertainment
Europe (or Playstation), I'm the Tech Lead on an internal middleware
solution which provides services (SaaS) to online games.
Our architecture is pretty similar to the diagram here: http://www.udidahan.com/2009/12/09/clarified-cqrs/ Although we use Cassandra
as a datastore rather than a relational database. We are then using elastic
search for all queries for any lists of items. Singular items hit our
services directly through to Cassandra.
We're currently not in production, but are going to be performing some
pretty serious load testing in the coming weeks which I'll be sure to let
you know about. We also have some pretty cool games as potential internal
customers for the project.
I would like to thank Shay and eveyone involved in elastic search. I was
initially intending to wrap Lucene in a similar way myself. Elastic search
means I don't have to so my own services have had a much higher velocity as
a result.
Thanks,
Paul
Note: Sony Computer Entertainment Europe do not endorse the use of any
third party software.
PPearcy and I are within a few weeks of replacing a legacy commercial
search system with ElasticSearch. We evaluated Flax/Xapian (vaporware)
and Solr. ES required very little tuning to get blazing performance.
We spent a limited amount of time tuning Solr, but since ES handled 5x
query volume, we quickly halted our efforts.
I'm not at liberty to disclose our employer, nor the search engine we
are replacing, due to licensing issues with our current search engine.
But we're design and host sites for the top online brokerage and
financial media sites. We have 20 million docs, with an average of 25
fields per doc. Our typical query string has 5-15 fields (entitlements
play a big factor in driving the field count up). Over the years we've
developed a caching strategy to typically cache 75% of requested
searches. Our typical peak traffic of actual uncached (and effectively
unique) queries is 230/s. We developed an Antlr parser to handle the
existing query language and translate our queries on the fly, to avoid
making changes to the existing web sites and applications.
With ES we can comfortably run 400 queries/sec, averaging under 100ms
on a single box (24 core, 48G ram). We are able to capture our
existing production traffic and replay it against ES. So we're testing
with a very diverse query set, with a real world mix. We are indexing
500 docs/sec, but this is throttled by our backend systems in our
development environment.
So far we're quire pleased at the performance, feature set and
robustness. Thanks Shay for putting so much time and effort into
ElasticSearch!
We at North Rhine-Westphalian Library Service Centre (hbz) are within a
few weeks of replacing FAST ESP with ElasticSearch.
We evaluated Solr, but as ES has all the features we need (and more!
... though facets are not fast enough, but that will be supposedly
solved when we need it) and experiencing the ease of scaling,
high-availability and administration in general we quickly decided to
use ES.
As a Library Service we have to index our catalogue with more than 17
Mio Entries, with an average of maybe 20 Fields. Indexing takes around 7
hours, being on two 4 core machines. The query peak is 5.
The query interface is CQL with Bib Context Set, the result is
transformed to Atom Feed using MODS format, so there is no need to
change anything in existing applications.
For this scenario of course Solr would be enough, but we plan to handle
more than 100 Mio Documents (having done that in the past). Connecting
other projects (i.e. Linked Data) to ES we calmly await more peaks to come.
We are glad that ES is Open Source and that its written in Java. We are
astonished following the rapidness of the developement in the past
months. It really makes fun to discover the possibilities of ES (it took
some time to discover that multi_field is exactly what we need).
Keep up the excellent working Shay, and all who are involved!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.