(Sorry if this is a double post, had some trouble with Nabble/Groups
and my google apps for domains email address).
Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas, etc). A
brief look at the code and some tests looks like all indexes are kept
open. Is this the case? What is the recommended way of dealing with
large number of indexes?
Thinking ideally there would be a LRU cache of IndexService's and
would open/close indexes (searchers/writers) and invalidate caches.
If there is not currently a good way to deal with this, any tips/
warnings on implementing something like this?
Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas, etc). A
brief look at the code and some tests looks like all indexes are kept
open. Is this the case? What is the recommended way of dealing with
large number of indexes?
Thanks for the link to that discussion. Here is why I prefer index/user
than one index with user_id discriminator:
many more users than active users, so no need to keep all
that inactive data in ram
smaller indexes are much easier to keep indexing/optimization fast (shard
would have to change all the time on the fly as you grow users, seems
expensive, and not sure discriminator field would narrow down to only the
correct shard for searches)
3) users come and go and change quite frequently, so would love to be able
to easily remove/reindex a user by simply removing an index
Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas, etc). A
brief look at the code and some tests looks like all indexes are kept
open. Is this the case? What is the recommended way of dealing with
large number of indexes?
The benefits you mentioned are valid and can be achieved by using multiple
indices, the problem is that they do come with an overhead. There is no LRU
for opened indexes, it can be implemented, but a bit complex (its cluster
wide management of opened / closed indices, and opening it once a user
requests it), not sure if its a viable path.
I would say go witha single index or an index per segmented number of
users.
Thanks for the link to that discussion. Here is why I prefer index/user
than one index with user_id discriminator:
many more users than active users, so no need to keep all
that inactive data in ram
smaller indexes are much easier to keep indexing/optimization fast
(shard # would have to change all the time on the fly as you grow users,
seems expensive, and not sure discriminator field would narrow down to only
the correct shard for searches)
users come and go and change quite frequently, so would love to be able
to easily remove/reindex a user by simply removing an index
Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas, etc). A
brief look at the code and some tests looks like all indexes are kept
open. Is this the case? What is the recommended way of dealing with
large number of indexes?
I understand that there is overhead associated with having many indices, and
opening/closing indices on the fly as users request may be complicated.
Would it be feasible (less complex) to have the capability to move indices
in and out of ES explicitly and leave the control the application itself? It
would be great to move data out of ES as an index and add back when/if
necessary. It would save us from exporting to another format and importing
back, etc.
Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
The benefits you mentioned are valid and can be achieved by using multiple
indices, the problem is that they do come with an overhead. There is no LRU
for opened indexes, it can be implemented, but a bit complex (its cluster
wide management of opened / closed indices, and opening it once a user
requests it), not sure if its a viable path.
I would say go witha single index or an index per segmented number of
users.
Thanks for the link to that discussion. Here is why I prefer index/user
than one index with user_id discriminator:
many more users than active users, so no need to keep all
that inactive data in ram
smaller indexes are much easier to keep indexing/optimization fast
(shard # would have to change all the time on the fly as you grow users,
seems expensive, and not sure discriminator field would narrow down to only
the correct shard for searches)
users come and go and change quite frequently, so would love to be able
to easily remove/reindex a user by simply removing an index
Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas, etc). A
brief look at the code and some tests looks like all indexes are kept
open. Is this the case? What is the recommended way of dealing with
large number of indexes?
Curious as to why it would be cluster wider management? Wouldn't each node
lazily check if it has the given index directory, and then open it if it was
closed? Obviously this would only work for file system directory stored
indexes, as you would need to find them on disk (in my naive approach).
The benefits you mentioned are valid and can be achieved by using multiple
indices, the problem is that they do come with an overhead. There is no LRU
for opened indexes, it can be implemented, but a bit complex (its cluster
wide management of opened / closed indices, and opening it once a user
requests it), not sure if its a viable path.
I would say go witha single index or an index per segmented number of
users.
Thanks for the link to that discussion. Here is why I prefer index/user
than one index with user_id discriminator:
many more users than active users, so no need to keep all
that inactive data in ram
smaller indexes are much easier to keep indexing/optimization fast
(shard # would have to change all the time on the fly as you grow users,
seems expensive, and not sure discriminator field would narrow down to only
the correct shard for searches)
users come and go and change quite frequently, so would love to be able
to easily remove/reindex a user by simply removing an index
Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas, etc). A
brief look at the code and some tests looks like all indexes are kept
open. Is this the case? What is the recommended way of dealing with
large number of indexes?
When you say move data out of elasticsearch, what do you mean? Move it out
where? There can be a "close" index option, where the data will still be
managed by elasticsearch (depending on the gateway implementation, it will
either reside locally on each node, or in a shared storage), and then an
open index that will cause for that index to start...
On Wed, Oct 13, 2010 at 9:13 PM, Berkay Mollamustafaoglu mberkay@gmail.comwrote:
Shay,
I understand that there is overhead associated with having many indices,
and opening/closing indices on the fly as users request may be complicated.
Would it be feasible (less complex) to have the capability to move indices
in and out of ES explicitly and leave the control the application itself? It
would be great to move data out of ES as an index and add back when/if
necessary. It would save us from exporting to another format and importing
back, etc.
Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
The benefits you mentioned are valid and can be achieved by using multiple
indices, the problem is that they do come with an overhead. There is no LRU
for opened indexes, it can be implemented, but a bit complex (its cluster
wide management of opened / closed indices, and opening it once a user
requests it), not sure if its a viable path.
I would say go witha single index or an index per segmented number of
users.
Thanks for the link to that discussion. Here is why I prefer index/user
than one index with user_id discriminator:
many more users than active users, so no need to keep all
that inactive data in ram
smaller indexes are much easier to keep indexing/optimization fast
(shard # would have to change all the time on the fly as you grow users,
seems expensive, and not sure discriminator field would narrow down to only
the correct shard for searches)
users come and go and change quite frequently, so would love to be
able to easily remove/reindex a user by simply removing an index
Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas, etc). A
brief look at the code and some tests looks like all indexes are kept
open. Is this the case? What is the recommended way of dealing with
large number of indexes?
Thats because shard allocated to node is a "live" thing. Shards gets moved
around (depending on nodes coming in an out) to try and create a balanced
allocation of nodes. I mentioned to Berkay that there can be a close index
and open index API, a closed index will basically have no shard allocated
for it until it is opened.
Curious as to why it would be cluster wider management? Wouldn't each node
lazily check if it has the given index directory, and then open it if it was
closed? Obviously this would only work for file system directory stored
indexes, as you would need to find them on disk (in my naive approach).
The benefits you mentioned are valid and can be achieved by using multiple
indices, the problem is that they do come with an overhead. There is no LRU
for opened indexes, it can be implemented, but a bit complex (its cluster
wide management of opened / closed indices, and opening it once a user
requests it), not sure if its a viable path.
I would say go witha single index or an index per segmented number of
users.
Thanks for the link to that discussion. Here is why I prefer index/user
than one index with user_id discriminator:
many more users than active users, so no need to keep all
that inactive data in ram
smaller indexes are much easier to keep indexing/optimization fast
(shard # would have to change all the time on the fly as you grow users,
seems expensive, and not sure discriminator field would narrow down to only
the correct shard for searches)
users come and go and change quite frequently, so would love to be
able to easily remove/reindex a user by simply removing an index
Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas, etc). A
brief look at the code and some tests looks like all indexes are kept
open. Is this the case? What is the recommended way of dealing with
large number of indexes?
By moving out I mean out of the ES cluster so that there is no longer
overhead in ES to maintain that index in ES. The data can continue to reside
where it is (or can be moved if that's easier), but from ES perspective, it
would be as if the index was deleted. The objective is to eliminate the
overhead associated with having many indices in ES, assuming that you do not
need every index to be actively available in ES.
To give an example, we have lots of log data, indexed as per month per
customer. We do not need the data from previous months to be actively
searchable all the time so we could "close" the indices of the previous
months. If we do need access to some data, we could then explicitly add the
index for that month back by re-opening the index, and closing back when
it's not needed, etc.
Of course, this only makes sense if ES will not have to allocate any
resources to an index once it is closed. Hope this makes little more sense.
Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
When you say move data out of elasticsearch, what do you mean? Move it out
where? There can be a "close" index option, where the data will still be
managed by elasticsearch (depending on the gateway implementation, it will
either reside locally on each node, or in a shared storage), and then an
open index that will cause for that index to start...
On Wed, Oct 13, 2010 at 9:13 PM, Berkay Mollamustafaoglu < mberkay@gmail.com> wrote:
Shay,
I understand that there is overhead associated with having many indices,
and opening/closing indices on the fly as users request may be complicated.
Would it be feasible (less complex) to have the capability to move indices
in and out of ES explicitly and leave the control the application itself? It
would be great to move data out of ES as an index and add back when/if
necessary. It would save us from exporting to another format and importing
back, etc.
Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
The benefits you mentioned are valid and can be achieved by using
multiple indices, the problem is that they do come with an overhead. There
is no LRU for opened indexes, it can be implemented, but a bit complex (its
cluster wide management of opened / closed indices, and opening it once a
user requests it), not sure if its a viable path.
I would say go witha single index or an index per segmented number of
users.
Thanks for the link to that discussion. Here is why I prefer
index/user than one index with user_id discriminator:
many more users than active users, so no need to keep all
that inactive data in ram
smaller indexes are much easier to keep indexing/optimization fast
(shard # would have to change all the time on the fly as you grow users,
seems expensive, and not sure discriminator field would narrow down to only
the correct shard for searches)
users come and go and change quite frequently, so would love to be
able to easily remove/reindex a user by simply removing an index
Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas, etc).
A
brief look at the code and some tests looks like all indexes are kept
open. Is this the case? What is the recommended way of dealing with
large number of indexes?
I can imagine that this could be extended a little bit further. Shay
mentioned the close and open API in previous email. Not sure how exactly
that was meant but would it be possible to tell ES to merge all shards on
close into one and split it into specified number of shards on open? Asa
side effect this would allow for off-line resharding. But may be that is
just a crazy idea...
Regards,
Lukas
On Wed, Oct 13, 2010 at 10:39 PM, Berkay Mollamustafaoglu <mberkay@gmail.com
wrote:
By moving out I mean out of the ES cluster so that there is no longer
overhead in ES to maintain that index in ES. The data can continue to reside
where it is (or can be moved if that's easier), but from ES perspective, it
would be as if the index was deleted. The objective is to eliminate the
overhead associated with having many indices in ES, assuming that you do not
need every index to be actively available in ES.
To give an example, we have lots of log data, indexed as per month per
customer. We do not need the data from previous months to be actively
searchable all the time so we could "close" the indices of the previous
months. If we do need access to some data, we could then explicitly add the
index for that month back by re-opening the index, and closing back when
it's not needed, etc.
Of course, this only makes sense if ES will not have to allocate any
resources to an index once it is closed. Hope this makes little more sense.
Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
When you say move data out of elasticsearch, what do you mean? Move it out
where? There can be a "close" index option, where the data will still be
managed by elasticsearch (depending on the gateway implementation, it will
either reside locally on each node, or in a shared storage), and then an
open index that will cause for that index to start...
On Wed, Oct 13, 2010 at 9:13 PM, Berkay Mollamustafaoglu < mberkay@gmail.com> wrote:
Shay,
I understand that there is overhead associated with having many indices,
and opening/closing indices on the fly as users request may be complicated.
Would it be feasible (less complex) to have the capability to move
indices in and out of ES explicitly and leave the control the application
itself? It would be great to move data out of ES as an index and add back
when/if necessary. It would save us from exporting to another format and
importing back, etc.
Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
The benefits you mentioned are valid and can be achieved by using
multiple indices, the problem is that they do come with an overhead. There
is no LRU for opened indexes, it can be implemented, but a bit complex (its
cluster wide management of opened / closed indices, and opening it once a
user requests it), not sure if its a viable path.
I would say go witha single index or an index per segmented number of
users.
Thanks for the link to that discussion. Here is why I prefer
index/user than one index with user_id discriminator:
many more users than active users, so no need to keep all
that inactive data in ram
smaller indexes are much easier to keep indexing/optimization fast
(shard # would have to change all the time on the fly as you grow users,
seems expensive, and not sure discriminator field would narrow down to only
the correct shard for searches)
users come and go and change quite frequently, so would love to be
able to easily remove/reindex a user by simply removing an index
Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas, etc).
A
brief look at the code and some tests looks like all indexes are
kept
open. Is this the case? What is the recommended way of dealing
with
large number of indexes?
I am not sure that merging is actually what you want (merging is a costly
operation). But back to open and close. Berkay, what you suggested is what I
meant. Closing an index will just maintain the index metadata in
elasticsearch, and nothing else (with the fact that its closed / blocked).
Shards will be deallocated, no active lucene indices will be active
(shards). Then, when opening an index, it will be "recovered" using the
usual recovery mechanism that is done on full cluster restart (only for that
index). Make sense?
I can imagine that this could be extended a little bit further. Shay
mentioned the close and open API in previous email. Not sure how exactly
that was meant but would it be possible to tell ES to merge all shards on
close into one and split it into specified number of shards on open? Asa
side effect this would allow for off-line resharding. But may be that is
just a crazy idea...
Regards,
Lukas
On Wed, Oct 13, 2010 at 10:39 PM, Berkay Mollamustafaoglu < mberkay@gmail.com> wrote:
By moving out I mean out of the ES cluster so that there is no longer
overhead in ES to maintain that index in ES. The data can continue to reside
where it is (or can be moved if that's easier), but from ES perspective, it
would be as if the index was deleted. The objective is to eliminate the
overhead associated with having many indices in ES, assuming that you do not
need every index to be actively available in ES.
To give an example, we have lots of log data, indexed as per month per
customer. We do not need the data from previous months to be actively
searchable all the time so we could "close" the indices of the previous
months. If we do need access to some data, we could then explicitly add the
index for that month back by re-opening the index, and closing back when
it's not needed, etc.
Of course, this only makes sense if ES will not have to allocate any
resources to an index once it is closed. Hope this makes little more sense.
Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
When you say move data out of elasticsearch, what do you mean? Move it
out where? There can be a "close" index option, where the data will still be
managed by elasticsearch (depending on the gateway implementation, it will
either reside locally on each node, or in a shared storage), and then an
open index that will cause for that index to start...
On Wed, Oct 13, 2010 at 9:13 PM, Berkay Mollamustafaoglu < mberkay@gmail.com> wrote:
Shay,
I understand that there is overhead associated with having many indices,
and opening/closing indices on the fly as users request may be complicated.
Would it be feasible (less complex) to have the capability to move
indices in and out of ES explicitly and leave the control the application
itself? It would be great to move data out of ES as an index and add back
when/if necessary. It would save us from exporting to another format and
importing back, etc.
Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
The benefits you mentioned are valid and can be achieved by using
multiple indices, the problem is that they do come with an overhead. There
is no LRU for opened indexes, it can be implemented, but a bit complex (its
cluster wide management of opened / closed indices, and opening it once a
user requests it), not sure if its a viable path.
I would say go witha single index or an index per segmented number of
users.
Thanks for the link to that discussion. Here is why I prefer
index/user than one index with user_id discriminator:
many more users than active users, so no need to keep all
that inactive data in ram
smaller indexes are much easier to keep indexing/optimization fast
(shard # would have to change all the time on the fly as you grow users,
seems expensive, and not sure discriminator field would narrow down to only
the correct shard for searches)
users come and go and change quite frequently, so would love to be
able to easily remove/reindex a user by simply removing an index
Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas, etc).
A
brief look at the code and some tests looks like all indexes are
kept
open. Is this the case? What is the recommended way of dealing
with
large number of indexes?
I am not sure that merging is actually what you want (merging is a costly
operation). But back to open and close. Berkay, what you suggested is what I
meant. Closing an index will just maintain the index metadata in
elasticsearch, and nothing else (with the fact that its closed / blocked).
Shards will be deallocated, no active lucene indices will be active
(shards). Then, when opening an index, it will be "recovered" using the
usual recovery mechanism that is done on full cluster restart (only for that
index). Make sense?
I can imagine that this could be extended a little bit further. Shay
mentioned the close and open API in previous email. Not sure how exactly
that was meant but would it be possible to tell ES to merge all shards on
close into one and split it into specified number of shards on open? Asa
side effect this would allow for off-line resharding. But may be that is
just a crazy idea...
Regards,
Lukas
On Wed, Oct 13, 2010 at 10:39 PM, Berkay Mollamustafaoglu < mberkay@gmail.com> wrote:
By moving out I mean out of the ES cluster so that there is no longer
overhead in ES to maintain that index in ES. The data can continue to reside
where it is (or can be moved if that's easier), but from ES perspective, it
would be as if the index was deleted. The objective is to eliminate the
overhead associated with having many indices in ES, assuming that you do not
need every index to be actively available in ES.
To give an example, we have lots of log data, indexed as per month per
customer. We do not need the data from previous months to be actively
searchable all the time so we could "close" the indices of the previous
months. If we do need access to some data, we could then explicitly add the
index for that month back by re-opening the index, and closing back when
it's not needed, etc.
Of course, this only makes sense if ES will not have to allocate any
resources to an index once it is closed. Hope this makes little more sense.
Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
When you say move data out of elasticsearch, what do you mean? Move it
out where? There can be a "close" index option, where the data will still be
managed by elasticsearch (depending on the gateway implementation, it will
either reside locally on each node, or in a shared storage), and then an
open index that will cause for that index to start...
On Wed, Oct 13, 2010 at 9:13 PM, Berkay Mollamustafaoglu < mberkay@gmail.com> wrote:
Shay,
I understand that there is overhead associated with having many
indices, and opening/closing indices on the fly as users request may be
complicated.
Would it be feasible (less complex) to have the capability to move
indices in and out of ES explicitly and leave the control the application
itself? It would be great to move data out of ES as an index and add back
when/if necessary. It would save us from exporting to another format and
importing back, etc.
Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
The benefits you mentioned are valid and can be achieved by using
multiple indices, the problem is that they do come with an overhead. There
is no LRU for opened indexes, it can be implemented, but a bit complex (its
cluster wide management of opened / closed indices, and opening it once a
user requests it), not sure if its a viable path.
I would say go witha single index or an index per segmented number of
users.
Thanks for the link to that discussion. Here is why I prefer
index/user than one index with user_id discriminator:
many more users than active users, so no need to keep all
that inactive data in ram
smaller indexes are much easier to keep indexing/optimization fast
(shard # would have to change all the time on the fly as you grow users,
seems expensive, and not sure discriminator field would narrow down to only
the correct shard for searches)
users come and go and change quite frequently, so would love to be
able to easily remove/reindex a user by simply removing an index
Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas,
etc). A
brief look at the code and some tests looks like all indexes are
kept
open. Is this the case? What is the recommended way of dealing
with
large number of indexes?
I am not sure that merging is actually what you want (merging is a costly
operation). But back to open and close. Berkay, what you suggested is what I
meant. Closing an index will just maintain the index metadata in
elasticsearch, and nothing else (with the fact that its closed / blocked).
Shards will be deallocated, no active lucene indices will be active
(shards). Then, when opening an index, it will be "recovered" using the
usual recovery mechanism that is done on full cluster restart (only for that
index). Make sense?
I can imagine that this could be extended a little bit further. Shay
mentioned the close and open API in previous email. Not sure how exactly
that was meant but would it be possible to tell ES to merge all shards on
close into one and split it into specified number of shards on open? Asa
side effect this would allow for off-line resharding. But may be that is
just a crazy idea...
Regards,
Lukas
On Wed, Oct 13, 2010 at 10:39 PM, Berkay Mollamustafaoglu < mberkay@gmail.com> wrote:
By moving out I mean out of the ES cluster so that there is no longer
overhead in ES to maintain that index in ES. The data can continue to reside
where it is (or can be moved if that's easier), but from ES perspective, it
would be as if the index was deleted. The objective is to eliminate the
overhead associated with having many indices in ES, assuming that you do not
need every index to be actively available in ES.
To give an example, we have lots of log data, indexed as per month per
customer. We do not need the data from previous months to be actively
searchable all the time so we could "close" the indices of the previous
months. If we do need access to some data, we could then explicitly add the
index for that month back by re-opening the index, and closing back when
it's not needed, etc.
Of course, this only makes sense if ES will not have to allocate any
resources to an index once it is closed. Hope this makes little more sense.
Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
When you say move data out of elasticsearch, what do you mean? Move it
out where? There can be a "close" index option, where the data will still be
managed by elasticsearch (depending on the gateway implementation, it will
either reside locally on each node, or in a shared storage), and then an
open index that will cause for that index to start...
On Wed, Oct 13, 2010 at 9:13 PM, Berkay Mollamustafaoglu < mberkay@gmail.com> wrote:
Shay,
I understand that there is overhead associated with having many
indices, and opening/closing indices on the fly as users request may be
complicated.
Would it be feasible (less complex) to have the capability to move
indices in and out of ES explicitly and leave the control the application
itself? It would be great to move data out of ES as an index and add back
when/if necessary. It would save us from exporting to another format and
importing back, etc.
Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
The benefits you mentioned are valid and can be achieved by using
multiple indices, the problem is that they do come with an overhead. There
is no LRU for opened indexes, it can be implemented, but a bit complex (its
cluster wide management of opened / closed indices, and opening it once a
user requests it), not sure if its a viable path.
I would say go witha single index or an index per segmented number
of users.
Thanks for the link to that discussion. Here is why I prefer
index/user than one index with user_id discriminator:
many more users than active users, so no need to keep all
that inactive data in ram
smaller indexes are much easier to keep indexing/optimization
fast (shard # would have to change all the time on the fly as you grow
users, seems expensive, and not sure discriminator field would narrow down
to only the correct shard for searches)
users come and go and change quite frequently, so would love to
be able to easily remove/reindex a user by simply removing an index
Interested in having an index per user which would result in many
indexes with the same configuration (mapping, shard/replicas,
etc). A
brief look at the code and some tests looks like all indexes are
kept
open. Is this the case? What is the recommended way of dealing
with
large number of indexes?
I am not sure that merging is actually what you want (merging is a costly
operation). But back to open and close. Berkay, what you suggested is what I
meant. Closing an index will just maintain the index metadata in
elasticsearch, and nothing else (with the fact that its closed / blocked).
Shards will be deallocated, no active lucene indices will be active
(shards). Then, when opening an index, it will be "recovered" using the
usual recovery mechanism that is done on full cluster restart (only for that
index). Make sense?
I can imagine that this could be extended a little bit further. Shay
mentioned the close and open API in previous email. Not sure how exactly
that was meant but would it be possible to tell ES to merge all shards on
close into one and split it into specified number of shards on open? Asa
side effect this would allow for off-line resharding. But may be that is
just a crazy idea...
Regards,
Lukas
On Wed, Oct 13, 2010 at 10:39 PM, Berkay Mollamustafaoglu < mberkay@gmail.com> wrote:
By moving out I mean out of the ES cluster so that there is no longer
overhead in ES to maintain that index in ES. The data can continue to reside
where it is (or can be moved if that's easier), but from ES perspective, it
would be as if the index was deleted. The objective is to eliminate the
overhead associated with having many indices in ES, assuming that you do not
need every index to be actively available in ES.
To give an example, we have lots of log data, indexed as per month per
customer. We do not need the data from previous months to be actively
searchable all the time so we could "close" the indices of the previous
months. If we do need access to some data, we could then explicitly add the
index for that month back by re-opening the index, and closing back when
it's not needed, etc.
Of course, this only makes sense if ES will not have to allocate any
resources to an index once it is closed. Hope this makes little more sense.
Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
When you say move data out of elasticsearch, what do you mean? Move it
out where? There can be a "close" index option, where the data will still be
managed by elasticsearch (depending on the gateway implementation, it will
either reside locally on each node, or in a shared storage), and then an
open index that will cause for that index to start...
On Wed, Oct 13, 2010 at 9:13 PM, Berkay Mollamustafaoglu < mberkay@gmail.com> wrote:
Shay,
I understand that there is overhead associated with having many
indices, and opening/closing indices on the fly as users request may be
complicated.
Would it be feasible (less complex) to have the capability to move
indices in and out of ES explicitly and leave the control the application
itself? It would be great to move data out of ES as an index and add back
when/if necessary. It would save us from exporting to another format and
importing back, etc.
Regards,
Berkay Mollamustafaoglu
mberkay on yahoo, google and skype
The benefits you mentioned are valid and can be achieved by using
multiple indices, the problem is that they do come with an overhead. There
is no LRU for opened indexes, it can be implemented, but a bit complex (its
cluster wide management of opened / closed indices, and opening it once a
user requests it), not sure if its a viable path.
I would say go witha single index or an index per segmented number
of users.
Thanks for the link to that discussion. Here is why I prefer
index/user than one index with user_id discriminator:
many more users than active users, so no need to keep all
that inactive data in ram
smaller indexes are much easier to keep indexing/optimization
fast (shard # would have to change all the time on the fly as you grow
users, seems expensive, and not sure discriminator field would narrow down
to only the correct shard for searches)
users come and go and change quite frequently, so would love to
be able to easily remove/reindex a user by simply removing an index
Interested in having an index per user which would result in
many
indexes with the same configuration (mapping, shard/replicas,
etc). A
brief look at the code and some tests looks like all indexes are
kept
open. Is this the case? What is the recommended way of dealing
with
large number of indexes?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.