By moving out I mean out of the ES cluster so that there is no longer
overhead in ES to maintain that index in ES. The data can continue to reside
where it is (or can be moved if that's easier), but from ES perspective, it
would be as if the index was deleted. The objective is to eliminate the
overhead associated with having many indices in ES, assuming that you do not
need every index to be actively available in ES.
To give an example, we have lots of log data, indexed as per month per
customer. We do not need the data from previous months to be actively
searchable all the time so we could "close" the indices of the previous
months. If we do need access to some data, we could then explicitly add the
index for that month back by re-opening the index, and closing back when
it's not needed, etc.
Of course, this only makes sense if ES will not have to allocate any
resources to an index once it is closed. Hope this makes little more sense.
mberkay on yahoo, google and skype
On Wed, Oct 13, 2010 at 4:01 PM, Shay Banon firstname.lastname@example.org:
When you say move data out of elasticsearch, what do you mean? Move it out
where? There can be a "close" index option, where the data will still be
managed by elasticsearch (depending on the gateway implementation, it will
either reside locally on each node, or in a shared storage), and then an
open index that will cause for that index to start...
On Wed, Oct 13, 2010 at 9:13 PM, Berkay Mollamustafaoglu <
I understand that there is overhead associated with having many indices,
and opening/closing indices on the fly as users request may be complicated.
Would it be feasible (less complex) to have the capability to move indices
in and out of ES explicitly and leave the control the application itself? It
would be great to move data out of ES as an index and add back when/if
necessary. It would save us from exporting to another format and importing
mberkay on yahoo, google and skype
On Wed, Oct 13, 2010 at 2:47 PM, Shay Banon <email@example.com
The benefits you mentioned are valid and can be achieved by using
multiple indices, the problem is that they do come with an overhead. There
is no LRU for opened indexes, it can be implemented, but a bit complex (its
cluster wide management of opened / closed indices, and opening it once a
user requests it), not sure if its a viable path.
I would say go witha single index or an index per segmented number of
On Wed, Oct 13, 2010 at 8:28 PM, Matt Hartzler firstname.lastname@example.org:
Thanks for the link to that discussion. Here is why I prefer
index/user than one index with user_id discriminator:
- many more users than active users, so no need to keep all
that inactive data in ram
- smaller indexes are much easier to keep indexing/optimization fast
(shard # would have to change all the time on the fly as you grow users,
seems expensive, and not sure discriminator field would narrow down to only
the correct shard for searches)
- users come and go and change quite frequently, so would love to be
able to easily remove/reindex a user by simply removing an index
On Wed, Oct 13, 2010 at 1:16 PM, Clinton Gormley <
Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas, etc).
brief look at the code and some tests looks like all indexes are kept
open. Is this the case? What is the recommended way of dealing with
large number of indexes?
Have a look at this thread: