Many indices revisited

Hi guys,

I found some related post that ask for an index per user, like this one:
http://elasticsearch-users.115913.n3.nabble.com/Using-Many-Indexes-td1696219.html

but that was a while back and I was wondering if there was more support
built for this use case, namely:

  • Millions of small indexes distributed over a fleet of machines
  • Only a small subset of those indexes is active at any time.
  • Indexes unused for a while get evicted automatically to free up space
    (back then Shay said it would be tricky to implement)
  • Having an overhead per Lucene index is still worth it given then above
    and faster search speed per index, and faster updates per index.
  • each machine can fit several thousand indices in memory using either mmap
    directory and file system cache, or the bytebuffer directory that
    elasticsearch implemented.

Is this now a viable scenario in elasticsearch?

Some additional questions:

  • The closing of the indices was advised to be done via an API call,
    meaning someone else has to keep track of LRU logic. Can this be done
    automatically? Say oldest indices get auto closed when their number grows
    beyond certain count on a particular machine (this can potentially lead to
    thrashing, but then the fleet would need to be big enough to avoid it).
  • When opening an index on the fly, is it possible to get a pre-build index
    from somewhere else and bootstrap it in as opposed to just giving raw data
    to ES and have it index on demand? The goal is to make the search available
    asap and not waste the CPU cycles on indexing on the ES fleet itself.

Does ES support the above or if not, would it be a lot of work to create
modules for that.
The entire guide is a lot to take in at once, could you please point me to
specific places that I can read to get a better idea on this use case, and
examples of modules that manage indices to see if I can build something
that suits me if it doesn't already exist.

Thank you in advance,

Aleksey

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

It appears that Solr has this:
http://wiki.apache.org/solr/LotsOfCores

but it would be nice to give ES a shot as well.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

the main overhead comes from having many lucene indices (a shard in ES, or a core in solr).

what you mentioned is pretty much the state of ES. you can close/open indices, but you need to manage it yourself.

you can store the index somewhere else, but you will need to copy it over to the cluster yourself and call open (and pay the price of copying it over). snapshot/restore feature n 1.0 will allow to do it more easily.

obviously the more machines you have, the more indices/shards you will have. Most cases, you can overload a single index with the "many indices" case, check my video at buzz on the video page where I Tal about data design patterns and spaicially the users design pattern.

On Fri, Jun 7, 2013 at 1:58 AM, Aleksey . bittercold@gmail.com wrote:

It appears that Solr has this:
LotsOfCores - Solr - Apache Software Foundation
but it would be nice to give ES a shot as well.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.