Reindex & Missing Documents


(Todd Bowles Console) #1

I've got a fairly stock standard ELK stack with a bunch of data in it.

We control our field mappings, and sometimes we add a new mapping. Obviously old indexes aren't searchable on that new mapping, so I'd like to start reindexing the data when that happens.

The intent would be to reindex into a temporary index, then delete the old one, reindex into a new index named the same as the old one and delete the temporary one.

I wrote what I thought was a fairly stock standard script to do this, which I've included below:

[CmdletBinding()]
param
(
    [string]$elasticsearchUrl,
    [string]$indexMatchRegex="logstash-2017\.07\.31",
    [switch]$whatIf=$true
)

if ($whatIf)
{
    Write-Output "Running a theoretical reindex of all indexes in [$elasticsearchUrl] that match pattern [$indexMatchRegex]";
}

$ErrorActionPreference = "Stop";

function Reindex
{
    [CmdletBinding()]
    param
    (
        [string]$elasticsearchUrl,
        [string]$sourceIndex,
        [string]$destinationIndex,
        [switch]$whatIf=$true
    )

    if ($WhatIf)
    {
        Write-Output "Would have created a new index with name [$destinationIndex] here";
    }
    else 
    {
        Write-Output "Creating a new index with name [$destinationIndex]";
        $create = Invoke-WebRequest -Method PUT -Uri ("$elasticsearchUrl/$destinationIndex" + "?pretty") -Headers @{"accept"="application/json"};
        Write-Output "Create response";
        Write-Output "-------------------------------------------";
        Write-Output $create.Content;
        Write-Output "-------------------------------------------";
    }

    $reindexPayload = "{ `"source`": { `"index`": `"$($sourceIndex)`" }, `"dest`": { `"index`": `"$destinationIndex`" } }";

    if ($WhatIf)
    {
        Write-Output "Would have created a reindex request using payload [$reindexPayload]";
    }
    else
    {
        Write-Output "Reindexing using payload [$reindexPayload]";
        $reindex = Invoke-WebRequest -Method POST -Uri "$elasticsearchUrl/_reindex?pretty" -Body $reindexPayload -Headers @{"accept"="application/json";"content-type"="application/json"} -TimeoutSec 3600;
        Write-Output "Reindex response";
        Write-Output "-------------------------------------------";
        Write-Output $reindex.Content;
        Write-Output "-------------------------------------------";
    }

    if ($WhatIf)
    {
        Write-Output "Would have deleted the old index named [$sourceIndex] here";
    }
    else 
    {
        Write-Output "Deleting old index named [$sourceIndex]";
        $delete = Invoke-WebRequest -Method DELETE -Uri ("$elasticsearchUrl/$sourceIndex" + "?pretty") -Headers @{"accept"="application/json"};
        Write-Output "Delete response";
        Write-Output "-------------------------------------------";
        Write-Output $delete.Content;
        Write-Output "-------------------------------------------";
    }
}

$indices = Invoke-RestMethod "$elasticsearchUrl/_cat/indices?pretty" -Headers @{"accept"="application/json"};
$sortedIndices = $indices | Sort-Object { $_.index };
foreach ($index in $sortedIndices)
{
    $oldIndexName = $index.index;
    if ($oldIndexName -match $indexMatchRegex)
    {
        $newIndexName = "$oldIndexName-r";
        Reindex -elasticsearchUrl $elasticsearchUrl -sourceIndex $oldIndexName -destinationIndex $newIndexName -whatIf:$whatIf;
        Reindex -elasticsearchUrl $elasticsearchUrl -sourceIndex $newIndexName -destinationIndex $oldIndexName -whatIf:$whatIf;
    }
}

When I ran this script, it worked for a few indexes, but other indexes ended up with missing documents (or in one case, completely empty).

I'm sure I've done something silly (like not waiting for a synchronous response, or waiting for ES to complete the requested changed), but I'm not sure where.

I'm using Elasticsearch 5.4.2.

Any help or advice would be appreciated.

Thank you.


(Todd Bowles Console) #2

I couldn't include the output from the execution that resulted in lost data in the main post (character limits) so I've put it here instead:

It shows one index that worked as expected, and one that lost around 20000 documents.

Creating a new index with name [logstash-2017.08.02.00-r]
Create response
-------------------------------------------
{
  "acknowledged" : true,
  "shards_acknowledged" : true
}

-------------------------------------------
Reindexing using payload [{ "source": { "index": "logstash-2017.08.02.00" }, "dest": { "index": "logstash-2017.08.02.00-r" } }]
Reindex response
-------------------------------------------
{
  "took" : 4730,
  "timed_out" : false,
  "total" : 24515,
  "updated" : 0,
  "created" : 24515,
  "deleted" : 0,
  "batches" : 25,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

-------------------------------------------
Deleting old index named [logstash-2017.08.02.00]
Delete response
-------------------------------------------
{
  "acknowledged" : true
}

-------------------------------------------
Creating a new index with name [logstash-2017.08.02.00]
Create response
-------------------------------------------
{
  "acknowledged" : true,
  "shards_acknowledged" : true
}

-------------------------------------------
Reindexing using payload [{ "source": { "index": "logstash-2017.08.02.00-r" }, "dest": { "index": "logstash-2017.08.02.00" } }]
Reindex response
-------------------------------------------
{
  "took" : 5778,
  "timed_out" : false,
  "total" : 24515,
  "updated" : 0,
  "created" : 24515,
  "deleted" : 0,
  "batches" : 25,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

-------------------------------------------
Deleting old index named [logstash-2017.08.02.00-r]
Delete response
-------------------------------------------
{
  "acknowledged" : true
}

-------------------------------------------
Creating a new index with name [logstash-2017.08.02.01-r]
Create response
-------------------------------------------
{
  "acknowledged" : true,
  "shards_acknowledged" : true
}

-------------------------------------------
Reindexing using payload [{ "source": { "index": "logstash-2017.08.02.01" }, "dest": { "index": "logstash-2017.08.02.01-r" } }]
Reindex response
-------------------------------------------
{
  "took" : 4467,
  "timed_out" : false,
  "total" : 24311,
  "updated" : 0,
  "created" : 24311,
  "deleted" : 0,
  "batches" : 25,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

-------------------------------------------
Deleting old index named [logstash-2017.08.02.01]
Delete response
-------------------------------------------
{
  "acknowledged" : true
}

-------------------------------------------
Creating a new index with name [logstash-2017.08.02.01]
Create response
-------------------------------------------
{
  "acknowledged" : true,
  "shards_acknowledged" : true
}

-------------------------------------------
Reindexing using payload [{ "source": { "index": "logstash-2017.08.02.01-r" }, "dest": { "index": "logstash-2017.08.02.01" } }]
Reindex response
-------------------------------------------
{
  "took" : 560,
  "timed_out" : false,
  "total" : 2333,
  "updated" : 0,
  "created" : 2333,
  "deleted" : 0,
  "batches" : 3,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

-------------------------------------------
Deleting old index named [logstash-2017.08.02.01-r]
Delete response
-------------------------------------------
{
  "acknowledged" : true
}

(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.