24/7 Index Service?

Hi,

What is the best clustering / indexing strategy for making a large
batch update / re-indexing on a nightly basis so that users' search
performance is not affected. The update would be approximately 4
millioin documents.

thanks.

If you end up updating most of the docs, you can index the data into a different index. If not, then it will create more load on the system, just make sure you have enough servers to handle the load for both indexing and search.
On Monday, January 17, 2011 at 11:31 AM, dalesrob wrote:

Hi,

What is the best clustering / indexing strategy for making a large
batch update / re-indexing on a nightly basis so that users' search
performance is not affected. The update would be approximately 4
millioin documents.

thanks.

Could one tune this scenario? like decreasing realtime?

E.g. with solr I can index into indexA and let it replicate from that
into indexB. The queries then goes against indexB (of course realtime
is then really bad ...)

Regards,
Peter.

On 17 Jan., 18:10, Shay Banon shay.ba...@elasticsearch.com wrote:

If you end up updating most of the docs, you can index the data into a different index. If not, then it will create more load on the system, just make sure you have enough servers to handle the load for both indexing and search.

On Monday, January 17, 2011 at 11:31 AM, dalesrob wrote:

Hi,

What is the best clustering / indexing strategy for making a large
batch update / re-indexing on a nightly basis so that users' search
performance is not affected. The update would be approximately 4
millioin documents.

thanks.

There has been some talk on the issues to allow for shard replicas that only sync on index flushes (i.e. don't do the index operation, instead sync on the index actual files, which are updated on a flush in ES). Before anyone does any optimizations, I think a good place to start is to see if you really need them at all. ES is used in some big systems, and it holds its own pretty well.
On Monday, January 17, 2011 at 8:14 PM, Karussell wrote:

Could one tune this scenario? like decreasing realtime?

E.g. with solr I can index into indexA and let it replicate from that
into indexB. The queries then goes against indexB (of course realtime
is then really bad ...)

Regards,
Peter.

On 17 Jan., 18:10, Shay Banon shay.ba...@elasticsearch.com wrote:

If you end up updating most of the docs, you can index the data into a different index. If not, then it will create more load on the system, just make sure you have enough servers to handle the load for both indexing and search.

On Monday, January 17, 2011 at 11:31 AM, dalesrob wrote:

Hi,

What is the best clustering / indexing strategy for making a large
batch update / re-indexing on a nightly basis so that users' search
performance is not affected. The update would be approximately 4
millioin documents.

thanks.

Hi Shay,

Thanks for your quick answer, here are a few questions I have around
this:

  1. If create the nightly index as a new index name how do I then make
    it searchable under the old index, do I just create an alias at the
    end of the indexing process to alias the new index name to the old
    one?

  2. At some point wouldn't I also have to delete or decomission the
    old index, how much load would that have on the server, should I just
    do a searchAll query and delete that way?

  3. Is it possible to create a server just for indexing and at some
    point tell it to merge into the rest of the cluster. Can you give me
    some guidance on how I would do that the clustering is definitely just
    an out-of-the-box thing for me at the moment.

David.

On Jan 17, 5:10 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

If you end up updating most of the docs, you can index the data into a different index. If not, then it will create more load on the system, just make sure you have enough servers to handle the load for both indexing and search.

On Monday, January 17, 2011 at 11:31 AM, dalesrob wrote:

Hi,

What is the best clustering / indexing strategy for making a large
batch update / re-indexing on a nightly basis so that users' search
performance is not affected. The update would be approximately 4
millioin documents.

thanks.

We don't rebuild our indexes nightly (not sure why you need to do
that), but do have a mechanism in place for occasional index
rebuilds.

ES supports the notion of a physical index and an alias. We never
reference the physical index in our queries, just the alias.

We search against index01 (http://localhost:9200/index01/_search...)
Which actually points at:
indices: {
-index01_20110105224045: {
-aliases: [
"index01"
]

The physical index name is the index name + date/time stamp. When
we're rebuilding content we have:
indices: {
-index01_20110115224045: {
-aliases: [
"rebuild_index01"
]

Then when the rebuilt index is complete, we switch the aliases. Then
we drop the old physical index. Its more efficient to drop the index,
than to delete by query.

David

On Jan 18, 3:27 am, dalesrob davirobe...@gmail.com wrote:

Hi Shay,

Thanks for your quick answer, here are a few questions I have around
this:

  1. If create the nightly index as a new index name how do I then make
    it searchable under the old index, do I just create an alias at the
    end of the indexing process to alias the new index name to the old
    one?

  2. At some point wouldn't I also have to delete or decomission the
    old index, how much load would that have on the server, should I just
    do a searchAll query and delete that way?

  3. Is it possible to create a server just for indexing and at some
    point tell it to merge into the rest of the cluster. Can you give me
    some guidance on how I would do that the clustering is definitely just
    an out-of-the-box thing for me at the moment.

David.

On Jan 17, 5:10 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

If you end up updating most of the docs, you can index the data into a different index. If not, then it will create more load on the system, just make sure you have enough servers to handle the load for both indexing and search.

On Monday, January 17, 2011 at 11:31 AM, dalesrob wrote:

Hi,

What is the best clustering / indexing strategy for making a large
batch update / re-indexing on a nightly basis so that users' search
performance is not affected. The update would be approximately 4
millioin documents.

thanks.

dbenson nailed most of the answers. Again, I really recommend testing if you get a high load in this case to really require a complete reindex into a fresh index.
On Tuesday, January 18, 2011 at 6:40 PM, dbenson wrote:

We don't rebuild our indexes nightly (not sure why you need to do
that), but do have a mechanism in place for occasional index
rebuilds.

ES supports the notion of a physical index and an alias. We never
reference the physical index in our queries, just the alias.

We search against index01 (http://localhost:9200/index01/_search...)
Which actually points at:
indices: {
-index01_20110105224045: {
-aliases: [
"index01"
]

The physical index name is the index name + date/time stamp. When
we're rebuilding content we have:
indices: {
-index01_20110115224045: {
-aliases: [
"rebuild_index01"
]

Then when the rebuilt index is complete, we switch the aliases. Then
we drop the old physical index. Its more efficient to drop the index,
than to delete by query.

David

On Jan 18, 3:27 am, dalesrob davirobe...@gmail.com wrote:

Hi Shay,

Thanks for your quick answer, here are a few questions I have around
this:

  1. If create the nightly index as a new index name how do I then make
    it searchable under the old index, do I just create an alias at the
    end of the indexing process to alias the new index name to the old
    one?

  2. At some point wouldn't I also have to delete or decomission the
    old index, how much load would that have on the server, should I just
    do a searchAll query and delete that way?

  3. Is it possible to create a server just for indexing and at some
    point tell it to merge into the rest of the cluster. Can you give me
    some guidance on how I would do that the clustering is definitely just
    an out-of-the-box thing for me at the moment.

David.

On Jan 17, 5:10 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

If you end up updating most of the docs, you can index the data into a different index. If not, then it will create more load on the system, just make sure you have enough servers to handle the load for both indexing and search.

On Monday, January 17, 2011 at 11:31 AM, dalesrob wrote:

Hi,

What is the best clustering / indexing strategy for making a large
batch update / re-indexing on a nightly basis so that users' search
performance is not affected. The update would be approximately 4
millioin documents.

thanks.