I've recently started using elasticsearch and I'm having trouble
understanding when I need to reindex data.
What operations cause me to need to reindex data?
Which of these require creating a new index and for which can I simply
reuse the same index?
Are there settings that require the ES nodes to be restarted after
being updated?
How do I reindex online if I need to create a new index? I can use a
scan search to copy the documents from one index to another and then switch
the alias to point to the new index, but it would seem that I'd have to
take my regular indexing process offline or have it writing to both or else
the two indexes would not have the same documents when I switch the alias.
You mainly need to reindex the data if you change how data ends up being
indexed. This include changing things like a field being analyzed or not,
changing analyzer settings, or changing a type of a field.
The "online" settings that you can change on a live index are listed in the
update indices settings API, and the cluster update settings API. Any other
settings for an index require you closing the index, updating the settings,
and opening it, and on the cluster/node level, requires restarting the node.
I've recently started using elasticsearch and I'm having trouble
understanding when I need to reindex data.
What operations cause me to need to reindex data?
Which of these require creating a new index and for which can I
simply reuse the same index?
Are there settings that require the ES nodes to be restarted after
being updated?
How do I reindex online if I need to create a new index? I can use
a scan search to copy the documents from one index to another and then
switch the alias to point to the new index, but it would seem that I'd have
to take my regular indexing process offline or have it writing to both or
else the two indexes would not have the same documents when I switch the
alias.
Thanks. When do I need to create a new index vs when can I just reindex
using the same index?
On Tue, Apr 17, 2012 at 3:36 AM, Shay Banon kimchy@gmail.com wrote:
You mainly need to reindex the data if you change how data ends up being
indexed. This include changing things like a field being analyzed or not,
changing analyzer settings, or changing a type of a field.
The "online" settings that you can change on a live index are listed in
the update indices settings API, and the cluster update settings API. Any
other settings for an index require you closing the index, updating the
settings, and opening it, and on the cluster/node level, requires
restarting the node.
I've recently started using elasticsearch and I'm having trouble
understanding when I need to reindex data.
What operations cause me to need to reindex data?
Which of these require creating a new index and for which can I
simply reuse the same index?
Are there settings that require the ES nodes to be restarted after
being updated?
How do I reindex online if I need to create a new index? I can use
a scan search to copy the documents from one index to another and then
switch the alias to point to the new index, but it would seem that I'd have
to take my regular indexing process offline or have it writing to both or
else the two indexes would not have the same documents when I switch the
alias.
The same reasons mentioned before, effectively, they mean that you will end
up with a new index with new mapping behavior.
On Tue, Apr 17, 2012 at 4:38 PM, Ben McCann ben@benmccann.com wrote:
Thanks. When do I need to create a new index vs when can I just reindex
using the same index?
On Tue, Apr 17, 2012 at 3:36 AM, Shay Banon kimchy@gmail.com wrote:
You mainly need to reindex the data if you change how data ends up being
indexed. This include changing things like a field being analyzed or not,
changing analyzer settings, or changing a type of a field.
The "online" settings that you can change on a live index are listed in
the update indices settings API, and the cluster update settings API. Any
other settings for an index require you closing the index, updating the
settings, and opening it, and on the cluster/node level, requires
restarting the node.
I've recently started using elasticsearch and I'm having trouble
understanding when I need to reindex data.
What operations cause me to need to reindex data?
Which of these require creating a new index and for which can I
simply reuse the same index?
Are there settings that require the ES nodes to be restarted after
being updated?
How do I reindex online if I need to create a new index? I can
use a scan search to copy the documents from one index to another and then
switch the alias to point to the new index, but it would seem that I'd have
to take my regular indexing process offline or have it writing to both or
else the two indexes would not have the same documents when I switch the
alias.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.