Lucandra comparison?


(Adam Zell) #1

Hello,

Has anyone looked at http://github.com/tjake/Lucandra? It is Lucene +
Cassandra. I would be interested in any pros/cons vs. ES.

--
Adam
zellster@gmail.com


(Shay Banon) #2

Hi looked a bit at Lucandra and glossed over the code. Its an interesting
idea, but very problematic when it comes to scaling Lucene (based on
my experience). There are two problems with it:

  1. The way Lucene works, a single IndexWriter can index an index. This means
    that with Lucandra, you can only have a single JVM indexing your content to
    the same index. You can't scale it.

  2. When searching, there is a lot of additional data that Lucene stores on
    the "searcher/reader" level that is problematic to scale when working with a
    single reader against a large index. For example, FieldCache loads certain
    fields terms into memory, and it is used when sorting or very often used in
    facets implementations, custom scoring and so on. This data is, a lot of
    times, very problematic to load on a single JVM.

So, to sum it, you can get into scaling problems both when indexing and when
searching when using just Lucandra. The solution for that is to then "shard"
your indices between different JVMs, parallelize you index on several
smaller indices, and have search executed on all the different shards and
reduced (of course, this is just the tip of the iceberg). This are all
things elasticsearch already does.

One cool thing that can be done is to have Lucandra used on the shard level
in elasticsearch. I had a look at the code of Lucandra, have some concerns
here and there. Simply have not found the time to try and hack something in
that area.

-shay.banon

On Wed, Jul 28, 2010 at 3:21 AM, Adam Zell zellster@gmail.com wrote:

Hello,

Has anyone looked at http://github.com/tjake/Lucandra? It is Lucene +
Cassandra. I would be interested in any pros/cons vs. ES.

--
Adam
zellster@gmail.com


(system) #3