Using Many Indexes


(hartzler-2) #1

(Sorry if this is a double post, had some trouble with Nabble/Groups
and my google apps for domains email address).

Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas, etc). A
brief look at the code and some tests looks like all indexes are kept
open. Is this the case? What is the recommended way of dealing with
large number of indexes?

Thinking ideally there would be a LRU cache of IndexService's and
would open/close indexes (searchers/writers) and invalidate caches.
If there is not currently a good way to deal with this, any tips/
warnings on implementing something like this?

Thanks!


(Clinton Gormley) #2

Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas, etc). A
brief look at the code and some tests looks like all indexes are kept
open. Is this the case? What is the recommended way of dealing with
large number of indexes?

Have a look at this thread:
http://elasticsearch-users.115913.n3.nabble.com/How-to-create-user-indexes-on-the-fly-td1089007.html

clint


(hartzler-2) #3

Thanks for the link to that discussion. Here is why I prefer index/user
than one index with user_id discriminator:

  1. many more users than active users, so no need to keep all
    that inactive data in ram
  2. smaller indexes are much easier to keep indexing/optimization fast (shard

would have to change all the time on the fly as you grow users, seems

expensive, and not sure discriminator field would narrow down to only the
correct shard for searches)
3) users come and go and change quite frequently, so would love to be able
to easily remove/reindex a user by simply removing an index

On Wed, Oct 13, 2010 at 1:16 PM, Clinton Gormley clinton@iannounce.co.ukwrote:

Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas, etc). A
brief look at the code and some tests looks like all indexes are kept
open. Is this the case? What is the recommended way of dealing with
large number of indexes?

Have a look at this thread:

http://elasticsearch-users.115913.n3.nabble.com/How-to-create-user-indexes-on-the-fly-td1089007.html

clint


(Shay Banon) #4

The benefits you mentioned are valid and can be achieved by using multiple
indices, the problem is that they do come with an overhead. There is no LRU
for opened indexes, it can be implemented, but a bit complex (its cluster
wide management of opened / closed indices, and opening it once a user
requests it), not sure if its a viable path.

I would say go witha single index or an index per segmented number of
users.

-shay.banon

On Wed, Oct 13, 2010 at 8:28 PM, Matt Hartzler matt.hartzler@gmail.comwrote:

Thanks for the link to that discussion. Here is why I prefer index/user
than one index with user_id discriminator:

  1. many more users than active users, so no need to keep all
    that inactive data in ram
  2. smaller indexes are much easier to keep indexing/optimization fast
    (shard # would have to change all the time on the fly as you grow users,
    seems expensive, and not sure discriminator field would narrow down to only
    the correct shard for searches)
  3. users come and go and change quite frequently, so would love to be able
    to easily remove/reindex a user by simply removing an index

On Wed, Oct 13, 2010 at 1:16 PM, Clinton Gormley clinton@iannounce.co.ukwrote:

Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas, etc). A
brief look at the code and some tests looks like all indexes are kept
open. Is this the case? What is the recommended way of dealing with
large number of indexes?

Have a look at this thread:

http://elasticsearch-users.115913.n3.nabble.com/How-to-create-user-indexes-on-the-fly-td1089007.html

clint


(Berkay Mollamustafaoglu-2) #5

Shay,

I understand that there is overhead associated with having many indices, and
opening/closing indices on the fly as users request may be complicated.
Would it be feasible (less complex) to have the capability to move indices
in and out of ES explicitly and leave the control the application itself? It
would be great to move data out of ES as an index and add back when/if
necessary. It would save us from exporting to another format and importing
back, etc.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Wed, Oct 13, 2010 at 2:47 PM, Shay Banon shay.banon@elasticsearch.comwrote:

The benefits you mentioned are valid and can be achieved by using multiple
indices, the problem is that they do come with an overhead. There is no LRU
for opened indexes, it can be implemented, but a bit complex (its cluster
wide management of opened / closed indices, and opening it once a user
requests it), not sure if its a viable path.

I would say go witha single index or an index per segmented number of
users.

-shay.banon

On Wed, Oct 13, 2010 at 8:28 PM, Matt Hartzler matt.hartzler@gmail.comwrote:

Thanks for the link to that discussion. Here is why I prefer index/user
than one index with user_id discriminator:

  1. many more users than active users, so no need to keep all
    that inactive data in ram
  2. smaller indexes are much easier to keep indexing/optimization fast
    (shard # would have to change all the time on the fly as you grow users,
    seems expensive, and not sure discriminator field would narrow down to only
    the correct shard for searches)
  3. users come and go and change quite frequently, so would love to be able
    to easily remove/reindex a user by simply removing an index

On Wed, Oct 13, 2010 at 1:16 PM, Clinton Gormley <clinton@iannounce.co.uk

wrote:

Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas, etc). A
brief look at the code and some tests looks like all indexes are kept
open. Is this the case? What is the recommended way of dealing with
large number of indexes?

Have a look at this thread:

http://elasticsearch-users.115913.n3.nabble.com/How-to-create-user-indexes-on-the-fly-td1089007.html

clint


(hartzler-2) #6

Curious as to why it would be cluster wider management? Wouldn't each node
lazily check if it has the given index directory, and then open it if it was
closed? Obviously this would only work for file system directory stored
indexes, as you would need to find them on disk (in my naive approach).

On Wed, Oct 13, 2010 at 1:47 PM, Shay Banon shay.banon@elasticsearch.comwrote:

The benefits you mentioned are valid and can be achieved by using multiple
indices, the problem is that they do come with an overhead. There is no LRU
for opened indexes, it can be implemented, but a bit complex (its cluster
wide management of opened / closed indices, and opening it once a user
requests it), not sure if its a viable path.

I would say go witha single index or an index per segmented number of
users.

-shay.banon

On Wed, Oct 13, 2010 at 8:28 PM, Matt Hartzler matt.hartzler@gmail.comwrote:

Thanks for the link to that discussion. Here is why I prefer index/user
than one index with user_id discriminator:

  1. many more users than active users, so no need to keep all
    that inactive data in ram
  2. smaller indexes are much easier to keep indexing/optimization fast
    (shard # would have to change all the time on the fly as you grow users,
    seems expensive, and not sure discriminator field would narrow down to only
    the correct shard for searches)
  3. users come and go and change quite frequently, so would love to be able
    to easily remove/reindex a user by simply removing an index

On Wed, Oct 13, 2010 at 1:16 PM, Clinton Gormley <clinton@iannounce.co.uk

wrote:

Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas, etc). A
brief look at the code and some tests looks like all indexes are kept
open. Is this the case? What is the recommended way of dealing with
large number of indexes?

Have a look at this thread:

http://elasticsearch-users.115913.n3.nabble.com/How-to-create-user-indexes-on-the-fly-td1089007.html

clint


(Shay Banon) #7

When you say move data out of elasticsearch, what do you mean? Move it out
where? There can be a "close" index option, where the data will still be
managed by elasticsearch (depending on the gateway implementation, it will
either reside locally on each node, or in a shared storage), and then an
open index that will cause for that index to start...

On Wed, Oct 13, 2010 at 9:13 PM, Berkay Mollamustafaoglu
mberkay@gmail.comwrote:

Shay,

I understand that there is overhead associated with having many indices,
and opening/closing indices on the fly as users request may be complicated.
Would it be feasible (less complex) to have the capability to move indices
in and out of ES explicitly and leave the control the application itself? It
would be great to move data out of ES as an index and add back when/if
necessary. It would save us from exporting to another format and importing
back, etc.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Wed, Oct 13, 2010 at 2:47 PM, Shay Banon shay.banon@elasticsearch.comwrote:

The benefits you mentioned are valid and can be achieved by using multiple
indices, the problem is that they do come with an overhead. There is no LRU
for opened indexes, it can be implemented, but a bit complex (its cluster
wide management of opened / closed indices, and opening it once a user
requests it), not sure if its a viable path.

I would say go witha single index or an index per segmented number of
users.

-shay.banon

On Wed, Oct 13, 2010 at 8:28 PM, Matt Hartzler matt.hartzler@gmail.comwrote:

Thanks for the link to that discussion. Here is why I prefer index/user
than one index with user_id discriminator:

  1. many more users than active users, so no need to keep all
    that inactive data in ram
  2. smaller indexes are much easier to keep indexing/optimization fast
    (shard # would have to change all the time on the fly as you grow users,
    seems expensive, and not sure discriminator field would narrow down to only
    the correct shard for searches)
  3. users come and go and change quite frequently, so would love to be
    able to easily remove/reindex a user by simply removing an index

On Wed, Oct 13, 2010 at 1:16 PM, Clinton Gormley <
clinton@iannounce.co.uk> wrote:

Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas, etc). A
brief look at the code and some tests looks like all indexes are kept
open. Is this the case? What is the recommended way of dealing with
large number of indexes?

Have a look at this thread:

http://elasticsearch-users.115913.n3.nabble.com/How-to-create-user-indexes-on-the-fly-td1089007.html

clint


(Shay Banon) #8

Thats because shard allocated to node is a "live" thing. Shards gets moved
around (depending on nodes coming in an out) to try and create a balanced
allocation of nodes. I mentioned to Berkay that there can be a close index
and open index API, a closed index will basically have no shard allocated
for it until it is opened.

On Wed, Oct 13, 2010 at 9:32 PM, Matt Hartzler matt.hartzler@gmail.comwrote:

Curious as to why it would be cluster wider management? Wouldn't each node
lazily check if it has the given index directory, and then open it if it was
closed? Obviously this would only work for file system directory stored
indexes, as you would need to find them on disk (in my naive approach).

On Wed, Oct 13, 2010 at 1:47 PM, Shay Banon shay.banon@elasticsearch.comwrote:

The benefits you mentioned are valid and can be achieved by using multiple
indices, the problem is that they do come with an overhead. There is no LRU
for opened indexes, it can be implemented, but a bit complex (its cluster
wide management of opened / closed indices, and opening it once a user
requests it), not sure if its a viable path.

I would say go witha single index or an index per segmented number of
users.

-shay.banon

On Wed, Oct 13, 2010 at 8:28 PM, Matt Hartzler matt.hartzler@gmail.comwrote:

Thanks for the link to that discussion. Here is why I prefer index/user
than one index with user_id discriminator:

  1. many more users than active users, so no need to keep all
    that inactive data in ram
  2. smaller indexes are much easier to keep indexing/optimization fast
    (shard # would have to change all the time on the fly as you grow users,
    seems expensive, and not sure discriminator field would narrow down to only
    the correct shard for searches)
  3. users come and go and change quite frequently, so would love to be
    able to easily remove/reindex a user by simply removing an index

On Wed, Oct 13, 2010 at 1:16 PM, Clinton Gormley <
clinton@iannounce.co.uk> wrote:

Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas, etc). A
brief look at the code and some tests looks like all indexes are kept
open. Is this the case? What is the recommended way of dealing with
large number of indexes?

Have a look at this thread:

http://elasticsearch-users.115913.n3.nabble.com/How-to-create-user-indexes-on-the-fly-td1089007.html

clint


(Berkay Mollamustafaoglu-2) #9

By moving out I mean out of the ES cluster so that there is no longer
overhead in ES to maintain that index in ES. The data can continue to reside
where it is (or can be moved if that's easier), but from ES perspective, it
would be as if the index was deleted. The objective is to eliminate the
overhead associated with having many indices in ES, assuming that you do not
need every index to be actively available in ES.
To give an example, we have lots of log data, indexed as per month per
customer. We do not need the data from previous months to be actively
searchable all the time so we could "close" the indices of the previous
months. If we do need access to some data, we could then explicitly add the
index for that month back by re-opening the index, and closing back when
it's not needed, etc.
Of course, this only makes sense if ES will not have to allocate any
resources to an index once it is closed. Hope this makes little more sense.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Wed, Oct 13, 2010 at 4:01 PM, Shay Banon shay.banon@elasticsearch.comwrote:

When you say move data out of elasticsearch, what do you mean? Move it out
where? There can be a "close" index option, where the data will still be
managed by elasticsearch (depending on the gateway implementation, it will
either reside locally on each node, or in a shared storage), and then an
open index that will cause for that index to start...

On Wed, Oct 13, 2010 at 9:13 PM, Berkay Mollamustafaoglu <
mberkay@gmail.com> wrote:

Shay,

I understand that there is overhead associated with having many indices,
and opening/closing indices on the fly as users request may be complicated.
Would it be feasible (less complex) to have the capability to move indices
in and out of ES explicitly and leave the control the application itself? It
would be great to move data out of ES as an index and add back when/if
necessary. It would save us from exporting to another format and importing
back, etc.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Wed, Oct 13, 2010 at 2:47 PM, Shay Banon <shay.banon@elasticsearch.com

wrote:

The benefits you mentioned are valid and can be achieved by using
multiple indices, the problem is that they do come with an overhead. There
is no LRU for opened indexes, it can be implemented, but a bit complex (its
cluster wide management of opened / closed indices, and opening it once a
user requests it), not sure if its a viable path.

I would say go witha single index or an index per segmented number of
users.

-shay.banon

On Wed, Oct 13, 2010 at 8:28 PM, Matt Hartzler matt.hartzler@gmail.comwrote:

Thanks for the link to that discussion. Here is why I prefer
index/user than one index with user_id discriminator:

  1. many more users than active users, so no need to keep all
    that inactive data in ram
  2. smaller indexes are much easier to keep indexing/optimization fast
    (shard # would have to change all the time on the fly as you grow users,
    seems expensive, and not sure discriminator field would narrow down to only
    the correct shard for searches)
  3. users come and go and change quite frequently, so would love to be
    able to easily remove/reindex a user by simply removing an index

On Wed, Oct 13, 2010 at 1:16 PM, Clinton Gormley <
clinton@iannounce.co.uk> wrote:

Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas, etc).
A
brief look at the code and some tests looks like all indexes are kept
open. Is this the case? What is the recommended way of dealing with
large number of indexes?

Have a look at this thread:

http://elasticsearch-users.115913.n3.nabble.com/How-to-create-user-indexes-on-the-fly-td1089007.html

clint


(Lukáš Vlček) #10

I can imagine that this could be extended a little bit further. Shay
mentioned the close and open API in previous email. Not sure how exactly
that was meant but would it be possible to tell ES to merge all shards on
close into one and split it into specified number of shards on open? Asa
side effect this would allow for off-line resharding. But may be that is
just a crazy idea...

Regards,
Lukas

On Wed, Oct 13, 2010 at 10:39 PM, Berkay Mollamustafaoglu <mberkay@gmail.com

wrote:

By moving out I mean out of the ES cluster so that there is no longer
overhead in ES to maintain that index in ES. The data can continue to reside
where it is (or can be moved if that's easier), but from ES perspective, it
would be as if the index was deleted. The objective is to eliminate the
overhead associated with having many indices in ES, assuming that you do not
need every index to be actively available in ES.
To give an example, we have lots of log data, indexed as per month per
customer. We do not need the data from previous months to be actively
searchable all the time so we could "close" the indices of the previous
months. If we do need access to some data, we could then explicitly add the
index for that month back by re-opening the index, and closing back when
it's not needed, etc.
Of course, this only makes sense if ES will not have to allocate any
resources to an index once it is closed. Hope this makes little more sense.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Wed, Oct 13, 2010 at 4:01 PM, Shay Banon shay.banon@elasticsearch.comwrote:

When you say move data out of elasticsearch, what do you mean? Move it out
where? There can be a "close" index option, where the data will still be
managed by elasticsearch (depending on the gateway implementation, it will
either reside locally on each node, or in a shared storage), and then an
open index that will cause for that index to start...

On Wed, Oct 13, 2010 at 9:13 PM, Berkay Mollamustafaoglu <
mberkay@gmail.com> wrote:

Shay,

I understand that there is overhead associated with having many indices,
and opening/closing indices on the fly as users request may be complicated.
Would it be feasible (less complex) to have the capability to move
indices in and out of ES explicitly and leave the control the application
itself? It would be great to move data out of ES as an index and add back
when/if necessary. It would save us from exporting to another format and
importing back, etc.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Wed, Oct 13, 2010 at 2:47 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

The benefits you mentioned are valid and can be achieved by using
multiple indices, the problem is that they do come with an overhead. There
is no LRU for opened indexes, it can be implemented, but a bit complex (its
cluster wide management of opened / closed indices, and opening it once a
user requests it), not sure if its a viable path.

I would say go witha single index or an index per segmented number of
users.

-shay.banon

On Wed, Oct 13, 2010 at 8:28 PM, Matt Hartzler <matt.hartzler@gmail.com

wrote:

Thanks for the link to that discussion. Here is why I prefer
index/user than one index with user_id discriminator:

  1. many more users than active users, so no need to keep all
    that inactive data in ram
  2. smaller indexes are much easier to keep indexing/optimization fast
    (shard # would have to change all the time on the fly as you grow users,
    seems expensive, and not sure discriminator field would narrow down to only
    the correct shard for searches)
  3. users come and go and change quite frequently, so would love to be
    able to easily remove/reindex a user by simply removing an index

On Wed, Oct 13, 2010 at 1:16 PM, Clinton Gormley <
clinton@iannounce.co.uk> wrote:

Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas, etc).
A
brief look at the code and some tests looks like all indexes are
kept
open. Is this the case? What is the recommended way of dealing
with
large number of indexes?

Have a look at this thread:

http://elasticsearch-users.115913.n3.nabble.com/How-to-create-user-indexes-on-the-fly-td1089007.html

clint


(Shay Banon) #11

I am not sure that merging is actually what you want (merging is a costly
operation). But back to open and close. Berkay, what you suggested is what I
meant. Closing an index will just maintain the index metadata in
elasticsearch, and nothing else (with the fact that its closed / blocked).
Shards will be deallocated, no active lucene indices will be active
(shards). Then, when opening an index, it will be "recovered" using the
usual recovery mechanism that is done on full cluster restart (only for that
index). Make sense?

On Wed, Oct 13, 2010 at 10:53 PM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

I can imagine that this could be extended a little bit further. Shay
mentioned the close and open API in previous email. Not sure how exactly
that was meant but would it be possible to tell ES to merge all shards on
close into one and split it into specified number of shards on open? Asa
side effect this would allow for off-line resharding. But may be that is
just a crazy idea...

Regards,
Lukas

On Wed, Oct 13, 2010 at 10:39 PM, Berkay Mollamustafaoglu <
mberkay@gmail.com> wrote:

By moving out I mean out of the ES cluster so that there is no longer
overhead in ES to maintain that index in ES. The data can continue to reside
where it is (or can be moved if that's easier), but from ES perspective, it
would be as if the index was deleted. The objective is to eliminate the
overhead associated with having many indices in ES, assuming that you do not
need every index to be actively available in ES.
To give an example, we have lots of log data, indexed as per month per
customer. We do not need the data from previous months to be actively
searchable all the time so we could "close" the indices of the previous
months. If we do need access to some data, we could then explicitly add the
index for that month back by re-opening the index, and closing back when
it's not needed, etc.
Of course, this only makes sense if ES will not have to allocate any
resources to an index once it is closed. Hope this makes little more sense.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Wed, Oct 13, 2010 at 4:01 PM, Shay Banon <shay.banon@elasticsearch.com

wrote:

When you say move data out of elasticsearch, what do you mean? Move it
out where? There can be a "close" index option, where the data will still be
managed by elasticsearch (depending on the gateway implementation, it will
either reside locally on each node, or in a shared storage), and then an
open index that will cause for that index to start...

On Wed, Oct 13, 2010 at 9:13 PM, Berkay Mollamustafaoglu <
mberkay@gmail.com> wrote:

Shay,

I understand that there is overhead associated with having many indices,
and opening/closing indices on the fly as users request may be complicated.
Would it be feasible (less complex) to have the capability to move
indices in and out of ES explicitly and leave the control the application
itself? It would be great to move data out of ES as an index and add back
when/if necessary. It would save us from exporting to another format and
importing back, etc.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Wed, Oct 13, 2010 at 2:47 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

The benefits you mentioned are valid and can be achieved by using
multiple indices, the problem is that they do come with an overhead. There
is no LRU for opened indexes, it can be implemented, but a bit complex (its
cluster wide management of opened / closed indices, and opening it once a
user requests it), not sure if its a viable path.

I would say go witha single index or an index per segmented number of
users.

-shay.banon

On Wed, Oct 13, 2010 at 8:28 PM, Matt Hartzler <
matt.hartzler@gmail.com> wrote:

Thanks for the link to that discussion. Here is why I prefer
index/user than one index with user_id discriminator:

  1. many more users than active users, so no need to keep all
    that inactive data in ram
  2. smaller indexes are much easier to keep indexing/optimization fast
    (shard # would have to change all the time on the fly as you grow users,
    seems expensive, and not sure discriminator field would narrow down to only
    the correct shard for searches)
  3. users come and go and change quite frequently, so would love to be
    able to easily remove/reindex a user by simply removing an index

On Wed, Oct 13, 2010 at 1:16 PM, Clinton Gormley <
clinton@iannounce.co.uk> wrote:

Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas, etc).
A
brief look at the code and some tests looks like all indexes are
kept
open. Is this the case? What is the recommended way of dealing
with
large number of indexes?

Have a look at this thread:

http://elasticsearch-users.115913.n3.nabble.com/How-to-create-user-indexes-on-the-fly-td1089007.html

clint


(Berkay Mollamustafaoglu-2) #12

Yes it does! This would indeed solve the use case we have.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Wed, Oct 13, 2010 at 5:09 PM, Shay Banon shay.banon@elasticsearch.comwrote:

I am not sure that merging is actually what you want (merging is a costly
operation). But back to open and close. Berkay, what you suggested is what I
meant. Closing an index will just maintain the index metadata in
elasticsearch, and nothing else (with the fact that its closed / blocked).
Shards will be deallocated, no active lucene indices will be active
(shards). Then, when opening an index, it will be "recovered" using the
usual recovery mechanism that is done on full cluster restart (only for that
index). Make sense?

On Wed, Oct 13, 2010 at 10:53 PM, Lukáš Vlček lukas.vlcek@gmail.comwrote:

I can imagine that this could be extended a little bit further. Shay
mentioned the close and open API in previous email. Not sure how exactly
that was meant but would it be possible to tell ES to merge all shards on
close into one and split it into specified number of shards on open? Asa
side effect this would allow for off-line resharding. But may be that is
just a crazy idea...

Regards,
Lukas

On Wed, Oct 13, 2010 at 10:39 PM, Berkay Mollamustafaoglu <
mberkay@gmail.com> wrote:

By moving out I mean out of the ES cluster so that there is no longer
overhead in ES to maintain that index in ES. The data can continue to reside
where it is (or can be moved if that's easier), but from ES perspective, it
would be as if the index was deleted. The objective is to eliminate the
overhead associated with having many indices in ES, assuming that you do not
need every index to be actively available in ES.
To give an example, we have lots of log data, indexed as per month per
customer. We do not need the data from previous months to be actively
searchable all the time so we could "close" the indices of the previous
months. If we do need access to some data, we could then explicitly add the
index for that month back by re-opening the index, and closing back when
it's not needed, etc.
Of course, this only makes sense if ES will not have to allocate any
resources to an index once it is closed. Hope this makes little more sense.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Wed, Oct 13, 2010 at 4:01 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

When you say move data out of elasticsearch, what do you mean? Move it
out where? There can be a "close" index option, where the data will still be
managed by elasticsearch (depending on the gateway implementation, it will
either reside locally on each node, or in a shared storage), and then an
open index that will cause for that index to start...

On Wed, Oct 13, 2010 at 9:13 PM, Berkay Mollamustafaoglu <
mberkay@gmail.com> wrote:

Shay,

I understand that there is overhead associated with having many
indices, and opening/closing indices on the fly as users request may be
complicated.
Would it be feasible (less complex) to have the capability to move
indices in and out of ES explicitly and leave the control the application
itself? It would be great to move data out of ES as an index and add back
when/if necessary. It would save us from exporting to another format and
importing back, etc.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Wed, Oct 13, 2010 at 2:47 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

The benefits you mentioned are valid and can be achieved by using
multiple indices, the problem is that they do come with an overhead. There
is no LRU for opened indexes, it can be implemented, but a bit complex (its
cluster wide management of opened / closed indices, and opening it once a
user requests it), not sure if its a viable path.

I would say go witha single index or an index per segmented number of
users.

-shay.banon

On Wed, Oct 13, 2010 at 8:28 PM, Matt Hartzler <
matt.hartzler@gmail.com> wrote:

Thanks for the link to that discussion. Here is why I prefer
index/user than one index with user_id discriminator:

  1. many more users than active users, so no need to keep all
    that inactive data in ram
  2. smaller indexes are much easier to keep indexing/optimization fast
    (shard # would have to change all the time on the fly as you grow users,
    seems expensive, and not sure discriminator field would narrow down to only
    the correct shard for searches)
  3. users come and go and change quite frequently, so would love to be
    able to easily remove/reindex a user by simply removing an index

On Wed, Oct 13, 2010 at 1:16 PM, Clinton Gormley <
clinton@iannounce.co.uk> wrote:

Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas,
etc). A
brief look at the code and some tests looks like all indexes are
kept
open. Is this the case? What is the recommended way of dealing
with
large number of indexes?

Have a look at this thread:

http://elasticsearch-users.115913.n3.nabble.com/How-to-create-user-indexes-on-the-fly-td1089007.html

clint


(Shay Banon) #13

Cool, open a feature request for it? Probably won't make it to 0.12, but
will aim for 0.13....

On Wed, Oct 13, 2010 at 11:16 PM, Berkay Mollamustafaoglu <mberkay@gmail.com

wrote:

Yes it does! This would indeed solve the use case we have.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Wed, Oct 13, 2010 at 5:09 PM, Shay Banon shay.banon@elasticsearch.comwrote:

I am not sure that merging is actually what you want (merging is a costly
operation). But back to open and close. Berkay, what you suggested is what I
meant. Closing an index will just maintain the index metadata in
elasticsearch, and nothing else (with the fact that its closed / blocked).
Shards will be deallocated, no active lucene indices will be active
(shards). Then, when opening an index, it will be "recovered" using the
usual recovery mechanism that is done on full cluster restart (only for that
index). Make sense?

On Wed, Oct 13, 2010 at 10:53 PM, Lukáš Vlček lukas.vlcek@gmail.comwrote:

I can imagine that this could be extended a little bit further. Shay
mentioned the close and open API in previous email. Not sure how exactly
that was meant but would it be possible to tell ES to merge all shards on
close into one and split it into specified number of shards on open? Asa
side effect this would allow for off-line resharding. But may be that is
just a crazy idea...

Regards,
Lukas

On Wed, Oct 13, 2010 at 10:39 PM, Berkay Mollamustafaoglu <
mberkay@gmail.com> wrote:

By moving out I mean out of the ES cluster so that there is no longer
overhead in ES to maintain that index in ES. The data can continue to reside
where it is (or can be moved if that's easier), but from ES perspective, it
would be as if the index was deleted. The objective is to eliminate the
overhead associated with having many indices in ES, assuming that you do not
need every index to be actively available in ES.
To give an example, we have lots of log data, indexed as per month per
customer. We do not need the data from previous months to be actively
searchable all the time so we could "close" the indices of the previous
months. If we do need access to some data, we could then explicitly add the
index for that month back by re-opening the index, and closing back when
it's not needed, etc.
Of course, this only makes sense if ES will not have to allocate any
resources to an index once it is closed. Hope this makes little more sense.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Wed, Oct 13, 2010 at 4:01 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

When you say move data out of elasticsearch, what do you mean? Move it
out where? There can be a "close" index option, where the data will still be
managed by elasticsearch (depending on the gateway implementation, it will
either reside locally on each node, or in a shared storage), and then an
open index that will cause for that index to start...

On Wed, Oct 13, 2010 at 9:13 PM, Berkay Mollamustafaoglu <
mberkay@gmail.com> wrote:

Shay,

I understand that there is overhead associated with having many
indices, and opening/closing indices on the fly as users request may be
complicated.
Would it be feasible (less complex) to have the capability to move
indices in and out of ES explicitly and leave the control the application
itself? It would be great to move data out of ES as an index and add back
when/if necessary. It would save us from exporting to another format and
importing back, etc.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Wed, Oct 13, 2010 at 2:47 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

The benefits you mentioned are valid and can be achieved by using
multiple indices, the problem is that they do come with an overhead. There
is no LRU for opened indexes, it can be implemented, but a bit complex (its
cluster wide management of opened / closed indices, and opening it once a
user requests it), not sure if its a viable path.

I would say go witha single index or an index per segmented number
of users.

-shay.banon

On Wed, Oct 13, 2010 at 8:28 PM, Matt Hartzler <
matt.hartzler@gmail.com> wrote:

Thanks for the link to that discussion. Here is why I prefer
index/user than one index with user_id discriminator:

  1. many more users than active users, so no need to keep all
    that inactive data in ram
  2. smaller indexes are much easier to keep indexing/optimization
    fast (shard # would have to change all the time on the fly as you grow
    users, seems expensive, and not sure discriminator field would narrow down
    to only the correct shard for searches)
  3. users come and go and change quite frequently, so would love to
    be able to easily remove/reindex a user by simply removing an index

On Wed, Oct 13, 2010 at 1:16 PM, Clinton Gormley <
clinton@iannounce.co.uk> wrote:

Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas,
etc). A
brief look at the code and some tests looks like all indexes are
kept
open. Is this the case? What is the recommended way of dealing
with
large number of indexes?

Have a look at this thread:

http://elasticsearch-users.115913.n3.nabble.com/How-to-create-user-indexes-on-the-fly-td1089007.html

clint


(Berkay Mollamustafaoglu-2) #14

Great, thanks!

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Wed, Oct 13, 2010 at 5:17 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Cool, open a feature request for it? Probably won't make it to 0.12, but
will aim for 0.13....

On Wed, Oct 13, 2010 at 11:16 PM, Berkay Mollamustafaoglu <
mberkay@gmail.com> wrote:

Yes it does! This would indeed solve the use case we have.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Wed, Oct 13, 2010 at 5:09 PM, Shay Banon <shay.banon@elasticsearch.com

wrote:

I am not sure that merging is actually what you want (merging is a costly
operation). But back to open and close. Berkay, what you suggested is what I
meant. Closing an index will just maintain the index metadata in
elasticsearch, and nothing else (with the fact that its closed / blocked).
Shards will be deallocated, no active lucene indices will be active
(shards). Then, when opening an index, it will be "recovered" using the
usual recovery mechanism that is done on full cluster restart (only for that
index). Make sense?

On Wed, Oct 13, 2010 at 10:53 PM, Lukáš Vlček lukas.vlcek@gmail.comwrote:

I can imagine that this could be extended a little bit further. Shay
mentioned the close and open API in previous email. Not sure how exactly
that was meant but would it be possible to tell ES to merge all shards on
close into one and split it into specified number of shards on open? Asa
side effect this would allow for off-line resharding. But may be that is
just a crazy idea...

Regards,
Lukas

On Wed, Oct 13, 2010 at 10:39 PM, Berkay Mollamustafaoglu <
mberkay@gmail.com> wrote:

By moving out I mean out of the ES cluster so that there is no longer
overhead in ES to maintain that index in ES. The data can continue to reside
where it is (or can be moved if that's easier), but from ES perspective, it
would be as if the index was deleted. The objective is to eliminate the
overhead associated with having many indices in ES, assuming that you do not
need every index to be actively available in ES.
To give an example, we have lots of log data, indexed as per month per
customer. We do not need the data from previous months to be actively
searchable all the time so we could "close" the indices of the previous
months. If we do need access to some data, we could then explicitly add the
index for that month back by re-opening the index, and closing back when
it's not needed, etc.
Of course, this only makes sense if ES will not have to allocate any
resources to an index once it is closed. Hope this makes little more sense.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Wed, Oct 13, 2010 at 4:01 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

When you say move data out of elasticsearch, what do you mean? Move it
out where? There can be a "close" index option, where the data will still be
managed by elasticsearch (depending on the gateway implementation, it will
either reside locally on each node, or in a shared storage), and then an
open index that will cause for that index to start...

On Wed, Oct 13, 2010 at 9:13 PM, Berkay Mollamustafaoglu <
mberkay@gmail.com> wrote:

Shay,

I understand that there is overhead associated with having many
indices, and opening/closing indices on the fly as users request may be
complicated.
Would it be feasible (less complex) to have the capability to move
indices in and out of ES explicitly and leave the control the application
itself? It would be great to move data out of ES as an index and add back
when/if necessary. It would save us from exporting to another format and
importing back, etc.

Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype

On Wed, Oct 13, 2010 at 2:47 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

The benefits you mentioned are valid and can be achieved by using
multiple indices, the problem is that they do come with an overhead. There
is no LRU for opened indexes, it can be implemented, but a bit complex (its
cluster wide management of opened / closed indices, and opening it once a
user requests it), not sure if its a viable path.

I would say go witha single index or an index per segmented number
of users.

-shay.banon

On Wed, Oct 13, 2010 at 8:28 PM, Matt Hartzler <
matt.hartzler@gmail.com> wrote:

Thanks for the link to that discussion. Here is why I prefer
index/user than one index with user_id discriminator:

  1. many more users than active users, so no need to keep all
    that inactive data in ram
  2. smaller indexes are much easier to keep indexing/optimization
    fast (shard # would have to change all the time on the fly as you grow
    users, seems expensive, and not sure discriminator field would narrow down
    to only the correct shard for searches)
  3. users come and go and change quite frequently, so would love to
    be able to easily remove/reindex a user by simply removing an index

On Wed, Oct 13, 2010 at 1:16 PM, Clinton Gormley <
clinton@iannounce.co.uk> wrote:

Interested in having an index per user which would result in
many
indexes with the same configuration (mapping, shard/replicas,
etc). A
brief look at the code and some tests looks like all indexes are
kept
open. Is this the case? What is the recommended way of dealing
with
large number of indexes?

Have a look at this thread:

http://elasticsearch-users.115913.n3.nabble.com/How-to-create-user-indexes-on-the-fly-td1089007.html

clint


(system) #15