Elastic uses hardware minimally while reindexing

Hello,

our Setup:
4 Server, each with 2 cpus (each 24 vCore) 256 GB Ram, 2 SSDs (each 120GB) Raid0, 10Gbit interconnect.
On each Server we have 4 ES 6.1.1 Docker-Container running, created with docker compose. Each ES-Node has 29GB Heap. Docker has no limits to Hw.

Now we wanted to reindex an index with 120GB of Data (16 shards, 1 replica). No other tasks are running on the servers or nodes. The cluster can use the complete Hardware exclusively for reindexing.

After we started the reindexing task, we were wondering why the cluster is not realy using the hardware. Only some vcores on the servers were running with around 10%, the SSD I/O were between 0 and 20% and the Lan interconnect has been used minimally. Finally it took 3 hours for the task.

Can sombody tell us, why the hardware is not really used while reindexing?

Thank You!

1 Like

did you look into slicing for you reindex task?

Hi Simon,

yes, we tryed the slicing. But after finishing the reindexing, the new index has less documents then the original one. So, we were loosing data and didn't try slicing again.

We set the vm.max_map_count to 262144 as recommended. Is it maybe to small for 4 docker machines?

wait, what? I mean that sounds like a terrible bug to me. What version of elasticsearch are you using and do you keep on writing to the index you are reindexing from or do you reindex into the same index? /cc @jimczi @nik9000

We are using your offizial Docker Image for 6.1.1.
The index was "finished". We don't keep writing to the index any more. We also created a new index in which we were reindexing.

Can you share the complete request that you used to reindex with slice ?
Did you use automatic slicing:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#docs-reindex-automatic-slice
or manual:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#docs-reindex-manual-slice
?
Did you check for failures in the response of the reindex task ? Slicing is the preferred way to parallelize a single reindex task so it shouldn't miss any document. Are you able to retry the operation and record the response (the task response) if not already done ?

Indices created by: 

PUT index2
{
  "settings": {
    "number_of_replicas": 0,
    "number_of_shards": 20,
    "refresh_interval" : "300s"
  }
}
PUT index3
{
  "settings": {
    "number_of_replicas": 0,
    "number_of_shards": 20,
    "refresh_interval" : "300s"
  }
}

Reindex with slice:

POST _reindex
{
  "source": {
    "index": "index1",
    "slice": {
      "id": 0,
      "max": 2
    }
  },
  "dest": {
    "index": "index2"
  }
}
POST _reindex
{
  "source": {
    "index": "index1",
    "slice": {
      "id": 1,
      "max": 2
    }
  },
  "dest": {
    "index": "index2"
  }
}
POST _reindex
{
  "source": {
    "index": "index1",
    "slice": {
      "id": 0,
      "max": 2
    }
  },
  "dest": {
    "index": "index3"
  }
}
POST _reindex
{
  "source": {
    "index": "index1",
    "slice": {
      "id": 1,
      "max": 2
    }
  },
  "dest": {
    "index": "index3"
  }
}

We used this commands for creating the Index and manual slicing. We were not able to reproduce the problem again. we tryed it with autmatic slicing and it worked fine.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.