When do I need to reindex data?


(Ben McCann) #1

Hi,

I've recently started using elasticsearch and I'm having trouble
understanding when I need to reindex data.

  • What operations cause me to need to reindex data?
  • Which of these require creating a new index and for which can I simply
    reuse the same index?
  • Are there settings that require the ES nodes to be restarted after
    being updated?
  • How do I reindex online if I need to create a new index? I can use a
    scan search to copy the documents from one index to another and then switch
    the alias to point to the new index, but it would seem that I'd have to
    take my regular indexing process offline or have it writing to both or else
    the two indexes would not have the same documents when I switch the alias.

Thanks,
Ben


(Shay Banon) #2

You mainly need to reindex the data if you change how data ends up being
indexed. This include changing things like a field being analyzed or not,
changing analyzer settings, or changing a type of a field.

The "online" settings that you can change on a live index are listed in the
update indices settings API, and the cluster update settings API. Any other
settings for an index require you closing the index, updating the settings,
and opening it, and on the cluster/node level, requires restarting the node.

On Mon, Apr 16, 2012 at 4:14 AM, Ben McCann benjamin.j.mccann@gmail.comwrote:

Hi,

I've recently started using elasticsearch and I'm having trouble
understanding when I need to reindex data.

  • What operations cause me to need to reindex data?
  • Which of these require creating a new index and for which can I
    simply reuse the same index?
  • Are there settings that require the ES nodes to be restarted after
    being updated?
  • How do I reindex online if I need to create a new index? I can use
    a scan search to copy the documents from one index to another and then
    switch the alias to point to the new index, but it would seem that I'd have
    to take my regular indexing process offline or have it writing to both or
    else the two indexes would not have the same documents when I switch the
    alias.

Thanks,
Ben


(Ben McCann-2) #3

Thanks. When do I need to create a new index vs when can I just reindex
using the same index?

On Tue, Apr 17, 2012 at 3:36 AM, Shay Banon kimchy@gmail.com wrote:

You mainly need to reindex the data if you change how data ends up being
indexed. This include changing things like a field being analyzed or not,
changing analyzer settings, or changing a type of a field.

The "online" settings that you can change on a live index are listed in
the update indices settings API, and the cluster update settings API. Any
other settings for an index require you closing the index, updating the
settings, and opening it, and on the cluster/node level, requires
restarting the node.

On Mon, Apr 16, 2012 at 4:14 AM, Ben McCann benjamin.j.mccann@gmail.comwrote:

Hi,

I've recently started using elasticsearch and I'm having trouble
understanding when I need to reindex data.

  • What operations cause me to need to reindex data?
  • Which of these require creating a new index and for which can I
    simply reuse the same index?
  • Are there settings that require the ES nodes to be restarted after
    being updated?
  • How do I reindex online if I need to create a new index? I can use
    a scan search to copy the documents from one index to another and then
    switch the alias to point to the new index, but it would seem that I'd have
    to take my regular indexing process offline or have it writing to both or
    else the two indexes would not have the same documents when I switch the
    alias.

Thanks,
Ben


(Shay Banon) #4

The same reasons mentioned before, effectively, they mean that you will end
up with a new index with new mapping behavior.

On Tue, Apr 17, 2012 at 4:38 PM, Ben McCann ben@benmccann.com wrote:

Thanks. When do I need to create a new index vs when can I just reindex
using the same index?

On Tue, Apr 17, 2012 at 3:36 AM, Shay Banon kimchy@gmail.com wrote:

You mainly need to reindex the data if you change how data ends up being
indexed. This include changing things like a field being analyzed or not,
changing analyzer settings, or changing a type of a field.

The "online" settings that you can change on a live index are listed in
the update indices settings API, and the cluster update settings API. Any
other settings for an index require you closing the index, updating the
settings, and opening it, and on the cluster/node level, requires
restarting the node.

On Mon, Apr 16, 2012 at 4:14 AM, Ben McCann benjamin.j.mccann@gmail.comwrote:

Hi,

I've recently started using elasticsearch and I'm having trouble
understanding when I need to reindex data.

  • What operations cause me to need to reindex data?
  • Which of these require creating a new index and for which can I
    simply reuse the same index?
  • Are there settings that require the ES nodes to be restarted after
    being updated?
  • How do I reindex online if I need to create a new index? I can
    use a scan search to copy the documents from one index to another and then
    switch the alias to point to the new index, but it would seem that I'd have
    to take my regular indexing process offline or have it writing to both or
    else the two indexes would not have the same documents when I switch the
    alias.

Thanks,
Ben


(system) #5