Is there a "common" benchmark for Solr, ElasticSearch and Sensei?

Thanks Shay
can/did you publish somewhere the tests results you have conducted?
I just want to get a sense of the scale we are talking about. and what is
already proven.

Ori

On Fri, Apr 16, 2010 at 7:24 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Benchmark numbers are very subjective from where you run the benchmark to
how you run it. If someone would like to create an unbiased benchmark, I
would be happy to help. As for specific usecase, I suggest you write your
own and check. If you want, with elasticsearch repo there are some jmeter
scripts that I use to test.

cheers,
shay.banon

On Fri, Apr 16, 2010 at 6:46 PM, Ori Lahav olahav@gmail.com wrote:

Thanks Shay for this point in this thread.

having all the threads around benchmarking. Is there some king of a test
that you have done, or someone else done using Elasticsearch and can share
numbers regarding scale and performance?

things I would like to get a sense of are:

  1. number of documents in the index.
  2. how many instances on how many machines.
  3. at this size how many Writes/s
  4. how many Reads/s

if you have a link to previously done benchmark it would be appriciated -
or you can share your numbers.

Thanks
Ori

On Fri, Apr 16, 2010 at 2:36 PM, Shay Banon <shay.banon@elasticsearch.com

wrote:

Hope elasticsearch helps in the area of the squashing :). To be honest, I
followed a similar path waay back when I tried to have a distributed Lucene
Directory built on top of GigaSpaces/Coherence/Terracotta. From a design
perspective, its not very different from what Lucandra or the one over HBase
do, and at the end, they suffer from the same limitations that I noted...

cheers,
shay.banon

On Fri, Apr 16, 2010 at 1:49 PM, Tim Robertson <
timrobertson100@gmail.com> wrote:

Thanks Shay. Curiosity squashed

On Fri, Apr 16, 2010 at 12:47 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

This will work up to a point where your index gets big. In this case,
your
search JVM might not be able to hold all the information needed (for
example, won't be able to load the term info and field cache since its
too
big, and you will get either OOM or GC trashing).
cheers,
shay.banon

On Fri, Apr 16, 2010 at 1:25 PM, Tim Robertson <
timrobertson100@gmail.com>
wrote:

My curiosity was really if I could open many read only index readers
to scale the reading I guess.

On Fri, Apr 16, 2010 at 12:16 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

This solution is problematic IMO in how it works and especially
with how
Lucene works. With this solution, there is only a single Lucene
IndexWriter
that you can open, so your writes don't scale, regardless of the
amount
of
machines you add.
Also, Lucene cached a lot of information per reader/searcher
(fieldcache,
terms info, and so on). With large indices, you have a single
reader
working
against a very large cluster/index, and your client won't cope with
it.... .
You can't get around it, you need to shard a Lucene index into many
small
Lucene indices running on different machines, but then you need to
write
a
distributed Lucene solution. And hey, I think someone already built
one
:slight_smile:
(p.s. I am not even mentioning all the many other features
elasticsearch
gives over this very low level Lucene solution).

Shay

On Fri, Apr 16, 2010 at 10:43 AM, Tim Robertson
timrobertson100@gmail.com
wrote:

+1 from me. Very curious of how distributing the storage behind
Lucene performs, rather than multiple Lucene indexes distributed
themselves

On Fri, Apr 16, 2010 at 8:52 AM, Thomas Koch thomas@koch.ro
wrote:

Hi,

if you want to make a benchmark about search solutions over big
data,
you may
want to also consider the following solutions:

Sematext Cassandra Monitoring | Performance Monitoring Tool

Lucandra uses cassandra as the persistent layer for lucene. The
following two
attempts have ported lucandra to use HBase:

http://github.com/thkoch2001/lucehbase
GitHub - akkumar/hbasene: HBase as the backing store for the TF-IDF representations for Lucene

Both of the above are only proof of concepts by now, but may
quickly
become
production ready. In the end you only need to like 5 classes to
glue
lucene
with cassandra or HBase.

Best regards,

Thomas Koch, http://www.koch.ro

--
http://olahav.typepad.com

--
http://olahav.typepad.com