Re-indexing documents while routing has been enabled via multiple aliases

Hi,

I'm looking for the best way to handle model changes in my project. My SaaS
is going to serve multiple customers and,
besides the simpler scheme where index is created per customer, I'm
investigating the other ways to go.

For example, I could use single index with alias for each customer and
enabled routing (w/o field filter) to ensure the data of the particular
customer resides in the same shard.
I've already read an extremely helpful article Changing Mapping With Zero
Downtimehttp://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/ that
recommends creating new index and coping data into it while the old index is
still
accessible via the alias or aliases in my case.

But, how to ensure that data I'm going to re-index will get into the
customer-specific shards again as they was before, after re-index is done?
Is it OK to create temporary aliases with the same routing value as the
original ones for new index, re-index data from each shard, then point old
aliases to the new index
while saving the routing settings and delete the temporary aliases and old
index afterwards? Does it make sense?

Thanks,
Peter

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hey, I think what you say makes sense, the only thing I don't really
understand is what you mean by:

But, how to ensure that data I'm going to re-index will get into the
customer-specific shards again as they was before, after re-index is done?

can you elaborate on this a bit? I am not understanding the question really.

simon

On Monday, August 19, 2013 8:15:01 PM UTC+2, Peter Melnikov wrote:

Hi,

I'm looking for the best way to handle model changes in my project. My
SaaS is going to serve multiple customers and,
besides the simpler scheme where index is created per customer, I'm
investigating the other ways to go.

For example, I could use single index with alias for each customer and
enabled routing (w/o field filter) to ensure the data of the particular
customer resides in the same shard.
I've already read an extremely helpful article Changing Mapping With Zero
Downtimehttp://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/ that
recommends creating new index and coping data into it while the old index is
still
accessible via the alias or aliases in my case.

But, how to ensure that data I'm going to re-index will get into the
customer-specific shards again as they was before, after re-index is done?
Is it OK to create temporary aliases with the same routing value as the
original ones for new index, re-index data from each shard, then point old
aliases to the new index
while saving the routing settings and delete the temporary aliases and old
index afterwards? Does it make sense?

Thanks,
Peter

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Simon,

Let me make things more clear.
I have a single index that has multiple shards. I created aliases, say
customer_1, customer_2, customer_3 and so on.
These aliases has corresponding routing values for example 1, 2, 3... Thus,
I ensure that each data associated with specific customer will reside in
the same shard.
My application uses these aliases to index and search data for each
customer, so routing should work without need to specify routing as url
parameter explicitly.
Now imagine the situation when something has changed in the model, field
was deleted or its type has changed. I have no other option as to re-index
all the data and rebuild entire index.
Now If I follow the way proposed in the Changing Mapping With Zero Downtimehttp://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/ I
should create new index and then I should copy existing data into it (I'm
going to use re-index plugin for that). But, if I do not create aliases
with routing and do not use them while re-indexing, all the client data
will be mixed in random order across the different shards in new index, so
I lose customer data locality. The new index a priori is not aware of how
customer data were distributed before across the shards in the original
index, and I suppose I have to give it a hint using aliases during the
re-indexing procedure.

Likely, there is a chance that I missed something important from the
documentation and articles. So I would be grateful if one pointed me to my
mistake)

Thanks,
Peter

On Monday, August 19, 2013 9:46:31 PM UTC+3, simonw wrote:

Hey, I think what you say makes sense, the only thing I don't really
understand is what you mean by:

But, how to ensure that data I'm going to re-index will get into the
customer-specific shards again as they was before, after re-index is done?

can you elaborate on this a bit? I am not understanding the question
really.

simon

On Monday, August 19, 2013 8:15:01 PM UTC+2, Peter Melnikov wrote:

Hi,

I'm looking for the best way to handle model changes in my project. My
SaaS is going to serve multiple customers and,
besides the simpler scheme where index is created per customer, I'm
investigating the other ways to go.

For example, I could use single index with alias for each customer and
enabled routing (w/o field filter) to ensure the data of the particular
customer resides in the same shard.
I've already read an extremely helpful article Changing Mapping With
Zero Downtimehttp://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/ that
recommends creating new index and coping data into it while the old
index is still
accessible via the alias or aliases in my case.

But, how to ensure that data I'm going to re-index will get into the
customer-specific shards again as they was before, after re-index is done?
Is it OK to create temporary aliases with the same routing value as the
original ones for new index, re-index data from each shard, then point old
aliases to the new index
while saving the routing settings and delete the temporary aliases and
old index afterwards? Does it make sense?

Thanks,
Peter

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

While create a new index (say xxx_v2) cant u specify the _routing value in
the type mapping? This value can be same value as you had in alias config
Have a look at
http://www.elasticsearch.org/guide/reference/mapping/routing-field/

On Tuesday, August 20, 2013 12:55:21 AM UTC+5:30, Peter Melnikov wrote:

Hi Simon,

Let me make things more clear.
I have a single index that has multiple shards. I created aliases, say
customer_1, customer_2, customer_3 and so on.
These aliases has corresponding routing values for example 1, 2, 3...
Thus, I ensure that each data associated with specific customer will reside
in the same shard.
My application uses these aliases to index and search data for each
customer, so routing should work without need to specify routing as url
parameter explicitly.
Now imagine the situation when something has changed in the model, field
was deleted or its type has changed. I have no other option as to re-index
all the data and rebuild entire index.
Now If I follow the way proposed in the Changing Mapping With Zero
Downtimehttp://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/ I
should create new index and then I should copy existing data into it (I'm
going to use re-index plugin for that). But, if I do not create aliases
with routing and do not use them while re-indexing, all the client data
will be mixed in random order across the different shards in new index, so
I lose customer data locality. The new index a priori is not aware of how
customer data were distributed before across the shards in the original
index, and I suppose I have to give it a hint using aliases during the
re-indexing procedure.

Likely, there is a chance that I missed something important from the
documentation and articles. So I would be grateful if one pointed me to
my mistake)

Thanks,
Peter

On Monday, August 19, 2013 9:46:31 PM UTC+3, simonw wrote:

Hey, I think what you say makes sense, the only thing I don't really
understand is what you mean by:

But, how to ensure that data I'm going to re-index will get into the
customer-specific shards again as they was before, after re-index is done?

can you elaborate on this a bit? I am not understanding the question
really.

simon

On Monday, August 19, 2013 8:15:01 PM UTC+2, Peter Melnikov wrote:

Hi,

I'm looking for the best way to handle model changes in my project. My
SaaS is going to serve multiple customers and,
besides the simpler scheme where index is created per customer, I'm
investigating the other ways to go.

For example, I could use single index with alias for each customer and
enabled routing (w/o field filter) to ensure the data of the particular
customer resides in the same shard.
I've already read an extremely helpful article Changing Mapping With
Zero Downtimehttp://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/ that
recommends creating new index and coping data into it while the old
index is still
accessible via the alias or aliases in my case.

But, how to ensure that data I'm going to re-index will get into the
customer-specific shards again as they was before, after re-index is done?
Is it OK to create temporary aliases with the same routing value as the
original ones for new index, re-index data from each shard, then point old
aliases to the new index
while saving the routing settings and delete the temporary aliases and
old index afterwards? Does it make sense?

Thanks,
Peter

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks for idea, but for now it is not an option for me as I do not have
routing field in my data model, would like to try to use alias-level define
routing first. I'm going to write some tests.


Best regards,
Peter Melnikov

Software Engineer
Axamit Versa, Inc.
Mobile: +375 29 6747925
E-mail: peter@axamit.com

2013/8/22 S reachsonalgupta@gmail.com

While create a new index (say xxx_v2) cant u specify the _routing value in
the type mapping? This value can be same value as you had in alias config
Have a look at
http://www.elasticsearch.org/guide/reference/mapping/routing-field/

On Tuesday, August 20, 2013 12:55:21 AM UTC+5:30, Peter Melnikov wrote:

Hi Simon,

Let me make things more clear.
I have a single index that has multiple shards. I created aliases, say
customer_1, customer_2, customer_3 and so on.
These aliases has corresponding routing values for example 1, 2, 3...
Thus, I ensure that each data associated with specific customer will reside
in the same shard.
My application uses these aliases to index and search data for each
customer, so routing should work without need to specify routing as url
parameter explicitly.
Now imagine the situation when something has changed in the model, field
was deleted or its type has changed. I have no other option as to re-index
all the data and rebuild entire index.
Now If I follow the way proposed in the Changing Mapping With Zero
Downtimehttp://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/ I
should create new index and then I should copy existing data into it (I'm
going to use re-index plugin for that). But, if I do not create aliases
with routing and do not use them while re-indexing, all the client data
will be mixed in random order across the different shards in new index, so
I lose customer data locality. The new index a priori is not aware of how
customer data were distributed before across the shards in the original
index, and I suppose I have to give it a hint using aliases during the
re-indexing procedure.

Likely, there is a chance that I missed something important from the
documentation and articles. So I would be grateful if one pointed me to
my mistake)

Thanks,
Peter

On Monday, August 19, 2013 9:46:31 PM UTC+3, simonw wrote:

Hey, I think what you say makes sense, the only thing I don't really
understand is what you mean by:

But, how to ensure that data I'm going to re-index will get into the
customer-specific shards again as they was before, after re-index is done?

can you elaborate on this a bit? I am not understanding the question
really.

simon

On Monday, August 19, 2013 8:15:01 PM UTC+2, Peter Melnikov wrote:

Hi,

I'm looking for the best way to handle model changes in my project. My
SaaS is going to serve multiple customers and,
besides the simpler scheme where index is created per customer, I'm
investigating the other ways to go.

For example, I could use single index with alias for each customer and
enabled routing (w/o field filter) to ensure the data of the
particular customer resides in the same shard.
I've already read an extremely helpful article Changing Mapping With
Zero Downtimehttp://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/ that
recommends creating new index and coping data into it while the old
index is still
accessible via the alias or aliases in my case.

But, how to ensure that data I'm going to re-index will get into the
customer-specific shards again as they was before, after re-index is done?
Is it OK to create temporary aliases with the same routing value as the
original ones for new index, re-index data from each shard, then point old
aliases to the new index
while saving the routing settings and delete the temporary aliases and
old index afterwards? Does it make sense?

Thanks,
Peter

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/3llyLt2I2_Y/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

You can define the same explicit value as u had for alias. It need nt to be
field in doc

On Thursday, 22 August 2013, Peter Melnikov wrote:

Thanks for idea, but for now it is not an option for me as I do not have
routing field in my data model, would like to try to use alias-level define
routing first. I'm going to write some tests.


Best regards,
Peter Melnikov

Software Engineer
Axamit Versa, Inc.
Mobile: +375 29 6747925
E-mail: peter@axamit.com <javascript:_e({}, 'cvml', 'peter@axamit.com');>

2013/8/22 S reachsonalgupta@gmail.com

While create a new index (say xxx_v2) cant u specify the _routing value in
the type mapping? This value can be same value as you had in alias config
Have a look at
http://www.elasticsearch.org/guide/reference/mapping/routing-field/

On Tuesday, August 20, 2013 12:55:21 AM UTC+5:30, Peter Melnikov wrote:

Hi Simon,

Let me make things more clear.
I have a single index that has multiple shards. I created aliases, say
customer_1, customer_2, customer_3 and so on.
These aliases has corresponding routing values for example 1, 2, 3...
Thus, I ensure that each data associated with specific customer will reside
in the same shard.
My application uses these aliases to index and search data for each
customer, so routing should work without need to specify routing as url
parameter explicitly.
Now imagine the situation when something has changed in the model, field
was deleted or its type has changed. I have no other option as to re-index
all the data and rebuild entire index.
Now If I follow the way proposed in the Changing Mapping With Zero
Downtimehttp://www.elasticsearch.org/blog/changing-mapping-with-zero-downtime/ I
should create new index and then I should copy existing data into it (I'm
going to use re-index plugin for that). But, if I do not create aliases
with routing and do not use them while re-indexing, all the client data
will be mixed in random order across the different shards in new index, so
I lose customer data locality. The new index a priori is not aware of how
customer data were distributed before across the shards in the original
index, and I suppose I have to give it a hint using aliases during the
re-indexing procedure.

Likely, there is a chance that I missed something important from the
documentation and articles. So I would be grateful if one pointed me to
my mistake)

Thanks,
Peter

On Monday, August 19, 2013 9:46:31 PM UTC+3, simonw wrote:

Hey, I think what you say makes sense, the only thing I don't really
understand is what you mean by:

But, how to ensure that data I'm going to re-index will get into the
customer-specific shards again as they was before, after re-index is done?

can you elaborate on this a bit? I am not understanding the question
really.

simon

On Monday, August 19, 2013 8:15:01 PM UTC+2, Peter Melnikov wrote:

Hi,

I'm looking for the best way to handle model changes in my project. My
SaaS is going to serve multiple customers and,
besides the simpler scheme where index is created per customer, I'm
investigating the other ways to go.

For example, I could use single index with alias for each customer and
enabled routing (w/o field filter) to ensure the data of the particular
customer resides in the same shard.
I

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/3llyLt2I2_Y/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com <javascript:_e({}, 'cvml',
'elasticsearch%2Bunsubscribe@googlegroups.com');>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.