Best practice for rebuilding an index using aliases

Hey everyone,

I have a question about rebuilding an index. After reading the
elasticsearch guide and various topics here I've found that the best
practice for rebuilding an index without any downtime is by using aliases.
However, there are certain steps and processes around that, which I seek
advice for. First I'm going to take you through an example scenario, and
then I'll have some questions.

For example, you have "workshop_index_v1", with an alias "workshop". The
"workshop_index_v1" has a type called "guitar" which has three properties
with the following mapping:

"identifier" : "string"
"make" : "string"
"model" : "string"

Lets assume there is a lot of data in workshop_index_v1/guitar at the
moment, which has been populated from a separate database.

Now, I need to modify the mapping, because I've changed the source data, I
would like get rid of the "identifier" property, so my mapping becomes:

"make" : "string"
"model" : "string"

As we all know elasticsearch does not allow you to remove a property in the
mapping directly, you inevitably have to rebuild the index, which is fine
in my case.

So now a few things came to mind when I thought how to do this:

  • Create another index "workshop_index_v2", populate it with the data in
    "workshop_index_v1" using scroll and scan with the bulk API and later
    remove "workshop_index_v1" and add "workshop_index_v2" to the alias.
  • This will not work because the incorrect mapping(or a field value in
    the incorrect mapping) is already present in "workshop_index_v1", I do not
    want to copy everything as is.
  • Create another index "workshop_index_v2", populate it with the data
    from the original source
    • This works

One of the big issues here is, what happens to write requests while the new
index is being rebuilt.

As you can only write to one index, which one do you write to, the old one
or the new one, or both?

I feel, that writing to the new one, would work. I am beginner when it
comes to elasticsearch, any advice regarding any of this would be greatly
appreciated.

Best regards

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c2dbaaeb-b3cb-47db-8311-a6a918837fc6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I switched to using aliases about a year ago and I love it. I am able to
rebuild in the background and make a clean cutover once the process
completes.

Here are a couple of thoughts for your situation.

First create a second index that has the same format as your original.
When you are ready to start creating your final index, stop indexing to
your original and start indexing into this new index. Queries to both
indexes can be accomplished using a new alias, or by modifying the requests
to include both. Now you can transfer the bulk of your data from
workshop_index_v1 to workshop_index_v2 while workshop_index_v1 new
continues to collect the new documents. Once the initial scan and scroll
completes, you can cut over to workshop_index_v2 and run a scan and scroll
against the v1_new index, which should be relatively small and allow you to
quickly transfer those into your v2 schema.

The alternative is to run the scan and scroll twice against the v1 index.
Once to build the v2 index, at which point you cut to v2. The second time
to pick up any documents that were added after you started your initial
scan and scroll. This is a less than ideal scenario, will take longer, and
will result in an index with many deletes, without additional steps to
check to see if documents already exist. If you have a timestamp in your
documents, you might be able to make this reasonable. You will certainly
want to optimize after you complete this process.

The only downside to writing to the new one, is which one do you query
during the transition. If you write to the v2 index, queries to v1 will
not show new data, while queries to v2 will only show new data until the
migration progresses. Queries that span both may be complicated as the
mappings are different, if that is not the case then yes this is the easy
way. If you are ok with one of the caveats, then by all means this is the
simplest route.

Aaron

On Wednesday, March 11, 2015 at 10:47:59 AM UTC-6, mzrth_7810 wrote:

Hey everyone,

I have a question about rebuilding an index. After reading the
elasticsearch guide and various topics here I've found that the best
practice for rebuilding an index without any downtime is by using aliases.
However, there are certain steps and processes around that, which I seek
advice for. First I'm going to take you through an example scenario, and
then I'll have some questions.

For example, you have "workshop_index_v1", with an alias "workshop". The
"workshop_index_v1" has a type called "guitar" which has three properties
with the following mapping:

"identifier" : "string"
"make" : "string"
"model" : "string"

Lets assume there is a lot of data in workshop_index_v1/guitar at the
moment, which has been populated from a separate database.

Now, I need to modify the mapping, because I've changed the source data, I
would like get rid of the "identifier" property, so my mapping becomes:

"make" : "string"
"model" : "string"

As we all know elasticsearch does not allow you to remove a property in
the mapping directly, you inevitably have to rebuild the index, which is
fine in my case.

So now a few things came to mind when I thought how to do this:

  • Create another index "workshop_index_v2", populate it with the data
    in "workshop_index_v1" using scroll and scan with the bulk API and later
    remove "workshop_index_v1" and add "workshop_index_v2" to the alias.
  • This will not work because the incorrect mapping(or a field value in
    the incorrect mapping) is already present in "workshop_index_v1", I do not
    want to copy everything as is.
  • Create another index "workshop_index_v2", populate it with the data
    from the original source
    • This works

One of the big issues here is, what happens to write requests while the
new index is being rebuilt.

As you can only write to one index, which one do you write to, the old one
or the new one, or both?

I feel, that writing to the new one, would work. I am beginner when it
comes to elasticsearch, any advice regarding any of this would be greatly
appreciated.

Best regards

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c1a1f011-4d4f-4dba-b7f5-6899d4fe671e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I tried to reply earlier but seems Google lost that reply.

My suggestion would be to create a v1_new index that has the same mappings
as v1. When you are ready to migrate to v2, change indexing to go to
v1_new, change searches to cover v1 and v1_new (alias or query string),
copy v1 to v2, change indexing to go to v2, and searches to go to v2, copy
v1_new to v2. This will allow you to index while copying while being able
to easily identify the new documents.

If you are ok with only searching new documents for a while then you can
start indexing to v2, change search to v2, and start the copy.

If you are ok with only searching old documents for the duration of the
transfer start indexing to v2, do the copy, then change search to v2.

The last option is to leave indexing and search on v1, do the copy to v2,
switch indexing and search to v2, do another copy from v1, and finally
optimize. This has alot of potential problems. It will essentially create
a deleted version of all your documents, so the optimize is needed to
correct that. Also if your indexing is adding updates, and not just new
documents, then the second copy from v1 might overwrite some of those
updates, not good. If it were me and I was not ok with the 2nd or 3rd
option I would defintely go route 1.

On Wednesday, March 11, 2015 at 10:47:59 AM UTC-6, mzrth_7810 wrote:

Hey everyone,

I have a question about rebuilding an index. After reading the
elasticsearch guide and various topics here I've found that the best
practice for rebuilding an index without any downtime is by using aliases.
However, there are certain steps and processes around that, which I seek
advice for. First I'm going to take you through an example scenario, and
then I'll have some questions.

For example, you have "workshop_index_v1", with an alias "workshop". The
"workshop_index_v1" has a type called "guitar" which has three properties
with the following mapping:

"identifier" : "string"
"make" : "string"
"model" : "string"

Lets assume there is a lot of data in workshop_index_v1/guitar at the
moment, which has been populated from a separate database.

Now, I need to modify the mapping, because I've changed the source data, I
would like get rid of the "identifier" property, so my mapping becomes:

"make" : "string"
"model" : "string"

As we all know elasticsearch does not allow you to remove a property in
the mapping directly, you inevitably have to rebuild the index, which is
fine in my case.

So now a few things came to mind when I thought how to do this:

  • Create another index "workshop_index_v2", populate it with the data
    in "workshop_index_v1" using scroll and scan with the bulk API and later
    remove "workshop_index_v1" and add "workshop_index_v2" to the alias.
  • This will not work because the incorrect mapping(or a field value in
    the incorrect mapping) is already present in "workshop_index_v1", I do not
    want to copy everything as is.
  • Create another index "workshop_index_v2", populate it with the data
    from the original source
    • This works

One of the big issues here is, what happens to write requests while the
new index is being rebuilt.

As you can only write to one index, which one do you write to, the old one
or the new one, or both?

I feel, that writing to the new one, would work. I am beginner when it
comes to elasticsearch, any advice regarding any of this would be greatly
appreciated.

Best regards

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3b2d4361-1145-4f77-921a-c7be38e5bfa5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Weird that was the post I made yesterday morning that just now hit the list
after vanishing.

On Thu, Mar 12, 2015 at 10:21 AM, aaron@definemg.com wrote:

I switched to using aliases about a year ago and I love it. I am able to
rebuild in the background and make a clean cutover once the process
completes.

Here are a couple of thoughts for your situation.

First create a second index that has the same format as your original.
When you are ready to start creating your final index, stop indexing to
your original and start indexing into this new index. Queries to both
indexes can be accomplished using a new alias, or by modifying the requests
to include both. Now you can transfer the bulk of your data from
workshop_index_v1 to workshop_index_v2 while workshop_index_v1 new
continues to collect the new documents. Once the initial scan and scroll
completes, you can cut over to workshop_index_v2 and run a scan and scroll
against the v1_new index, which should be relatively small and allow you to
quickly transfer those into your v2 schema.

The alternative is to run the scan and scroll twice against the v1 index.
Once to build the v2 index, at which point you cut to v2. The second time
to pick up any documents that were added after you started your initial
scan and scroll. This is a less than ideal scenario, will take longer, and
will result in an index with many deletes, without additional steps to
check to see if documents already exist. If you have a timestamp in your
documents, you might be able to make this reasonable. You will certainly
want to optimize after you complete this process.

The only downside to writing to the new one, is which one do you query
during the transition. If you write to the v2 index, queries to v1 will
not show new data, while queries to v2 will only show new data until the
migration progresses. Queries that span both may be complicated as the
mappings are different, if that is not the case then yes this is the easy
way. If you are ok with one of the caveats, then by all means this is the
simplest route.

Aaron

On Wednesday, March 11, 2015 at 10:47:59 AM UTC-6, mzrth_7810 wrote:

Hey everyone,

I have a question about rebuilding an index. After reading the
elasticsearch guide and various topics here I've found that the best
practice for rebuilding an index without any downtime is by using aliases.
However, there are certain steps and processes around that, which I seek
advice for. First I'm going to take you through an example scenario, and
then I'll have some questions.

For example, you have "workshop_index_v1", with an alias "workshop". The
"workshop_index_v1" has a type called "guitar" which has three properties
with the following mapping:

"identifier" : "string"
"make" : "string"
"model" : "string"

Lets assume there is a lot of data in workshop_index_v1/guitar at the
moment, which has been populated from a separate database.

Now, I need to modify the mapping, because I've changed the source data,
I would like get rid of the "identifier" property, so my mapping becomes:

"make" : "string"
"model" : "string"

As we all know elasticsearch does not allow you to remove a property in
the mapping directly, you inevitably have to rebuild the index, which is
fine in my case.

So now a few things came to mind when I thought how to do this:

  • Create another index "workshop_index_v2", populate it with the data
    in "workshop_index_v1" using scroll and scan with the bulk API and later
    remove "workshop_index_v1" and add "workshop_index_v2" to the alias.
  • This will not work because the incorrect mapping(or a field value
    in the incorrect mapping) is already present in "workshop_index_v1", I do
    not want to copy everything as is.
  • Create another index "workshop_index_v2", populate it with the data
    from the original source
    • This works

One of the big issues here is, what happens to write requests while the
new index is being rebuilt.

As you can only write to one index, which one do you write to, the old
one or the new one, or both?

I feel, that writing to the new one, would work. I am beginner when it
comes to elasticsearch, any advice regarding any of this would be greatly
appreciated.

Best regards

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/U40jRfvA-ZM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c1a1f011-4d4f-4dba-b7f5-6899d4fe671e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c1a1f011-4d4f-4dba-b7f5-6899d4fe671e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAF9vEEqcvg3TMdXXjFDgRxhzfPFKnc0-TDx1ocp5t9tEby0b2w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thanks for your reply, I'll have a look into this.

On Friday, 13 March 2015 17:41:49 UTC, Aaron Mefford wrote:

Weird that was the post I made yesterday morning that just now hit the
list after vanishing.

On Thu, Mar 12, 2015 at 10:21 AM, <aa...@definemg.com <javascript:>>
wrote:

I switched to using aliases about a year ago and I love it. I am able to
rebuild in the background and make a clean cutover once the process
completes.

Here are a couple of thoughts for your situation.

First create a second index that has the same format as your original.
When you are ready to start creating your final index, stop indexing to
your original and start indexing into this new index. Queries to both
indexes can be accomplished using a new alias, or by modifying the requests
to include both. Now you can transfer the bulk of your data from
workshop_index_v1 to workshop_index_v2 while workshop_index_v1 new
continues to collect the new documents. Once the initial scan and scroll
completes, you can cut over to workshop_index_v2 and run a scan and scroll
against the v1_new index, which should be relatively small and allow you to
quickly transfer those into your v2 schema.

The alternative is to run the scan and scroll twice against the v1
index. Once to build the v2 index, at which point you cut to v2. The
second time to pick up any documents that were added after you started your
initial scan and scroll. This is a less than ideal scenario, will take
longer, and will result in an index with many deletes, without additional
steps to check to see if documents already exist. If you have a timestamp
in your documents, you might be able to make this reasonable. You will
certainly want to optimize after you complete this process.

The only downside to writing to the new one, is which one do you query
during the transition. If you write to the v2 index, queries to v1 will
not show new data, while queries to v2 will only show new data until the
migration progresses. Queries that span both may be complicated as the
mappings are different, if that is not the case then yes this is the easy
way. If you are ok with one of the caveats, then by all means this is the
simplest route.

Aaron

On Wednesday, March 11, 2015 at 10:47:59 AM UTC-6, mzrth_7810 wrote:

Hey everyone,

I have a question about rebuilding an index. After reading the
elasticsearch guide and various topics here I've found that the best
practice for rebuilding an index without any downtime is by using aliases.
However, there are certain steps and processes around that, which I seek
advice for. First I'm going to take you through an example scenario, and
then I'll have some questions.

For example, you have "workshop_index_v1", with an alias "workshop". The
"workshop_index_v1" has a type called "guitar" which has three properties
with the following mapping:

"identifier" : "string"
"make" : "string"
"model" : "string"

Lets assume there is a lot of data in workshop_index_v1/guitar at the
moment, which has been populated from a separate database.

Now, I need to modify the mapping, because I've changed the source data,
I would like get rid of the "identifier" property, so my mapping becomes:

"make" : "string"
"model" : "string"

As we all know elasticsearch does not allow you to remove a property in
the mapping directly, you inevitably have to rebuild the index, which is
fine in my case.

So now a few things came to mind when I thought how to do this:

  • Create another index "workshop_index_v2", populate it with the
    data in "workshop_index_v1" using scroll and scan with the bulk API and
    later remove "workshop_index_v1" and add "workshop_index_v2" to the alias.
  • This will not work because the incorrect mapping(or a field value
    in the incorrect mapping) is already present in "workshop_index_v1", I do
    not want to copy everything as is.
  • Create another index "workshop_index_v2", populate it with the
    data from the original source
    • This works

One of the big issues here is, what happens to write requests while the
new index is being rebuilt.

As you can only write to one index, which one do you write to, the old
one or the new one, or both?

I feel, that writing to the new one, would work. I am beginner when it
comes to elasticsearch, any advice regarding any of this would be greatly
appreciated.

Best regards

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/U40jRfvA-ZM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c1a1f011-4d4f-4dba-b7f5-6899d4fe671e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c1a1f011-4d4f-4dba-b7f5-6899d4fe671e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7be18d13-5f72-4470-8930-dd3e33fa7266%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.