Monitor Elastic Search Re-Index Progress

When re-indexing large indexes it can take some time. There is a Reindex API which helps monitor the progress of a re-index and I've written a PowerShell Script which uses this to display a progress bar. In the hopes that it may help others I post it here:

UPDATE: The following is the optimized code based on feedback below:

Clear
$ESHost = "http://yourESHostHere:9200"

$tasks = Invoke-WebRequest -Uri "$ESHost/_tasks?detailed=true&actions=*reindex&group_by=none" | ConvertFrom-Json
if (-not $tasks -or -not $tasks.tasks) {return "No re-index tasks found!"}

# Take the first task
$taskid = $tasks.tasks[0].node + ":" + $tasks.tasks[0].id

"Polling Task $taskid..."

$Done = $false
do {
    try {
        $task = Invoke-RestMethod -Uri "$ESHost/_tasks/$taskid"
    } catch {
        # 404 means task is complete
        if ($_.Exception.Response.StatusCode.value__ -eq 404) {$Done = $true}
        else {Write-Error $_}
    }
    if ($task.completed -or $Done) {
        "Done!"
    } else {
        $tot = $task.task.status.total
        $prog = $task.task.status.created
        $per = [Math]::round(100 * $prog / $tot)
        Write-Progress -PercentComplete $per -Activity "$per% Re-indexing" -Status "$prog / $tot documents created"
    }
    Sleep -Seconds 1
} while (-not $Done)

Note, it just takes the first active re-index task and monitors that. There's no support currently for monitoring multiple tasks.

1 Like

Did you try grouping by parents or none? Perhaps, these format would be more suitable for your needs.

We still see the following structure returned:

GET /_tasks?detailed=true&actions=*reindex&group_by=parents
{
    "tasks": {
        "6tgOP8JVSmai0c0uR_Co7g:609579": {
            "node": "6tgOP8JVSmai0c0uR_Co7g",
            "id": 609579,
            ...
        }
    }
}

I don't know about other languages but this is hard to parse in PowerShell. Ideally you would have an array:

{
    "tasks": [
        "6tgOP8JVSmai0c0uR_Co7g:609579": {
        ...
        }
    ]
}

Which you can simply iterate or access with tasks[0].

1 Like

Ah, I stand corrected. When we use group_by=none we get an array!

{
    "tasks": [
        {
            "node": "6tgOP8JVSmai0c0uR_Co7g",
            "id": 609579,
            "type": "transport",
            "action": "indices:data/write/reindex",
            "start_time_in_millis": 1523870243676,
            "running_time_in_nanos": 540175604651,
            "cancellable": true,
            "headers": {}
        }
    ]
}

Thanks Igor!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.