Elastic uses hardware minimally while reindexing

seb_ma · January 16, 2018, 9:51pm

Hello,

our Setup:
4 Server, each with 2 cpus (each 24 vCore) 256 GB Ram, 2 SSDs (each 120GB) Raid0, 10Gbit interconnect.
On each Server we have 4 ES 6.1.1 Docker-Container running, created with docker compose. Each ES-Node has 29GB Heap. Docker has no limits to Hw.

Now we wanted to reindex an index with 120GB of Data (16 shards, 1 replica). No other tasks are running on the servers or nodes. The cluster can use the complete Hardware exclusively for reindexing.

After we started the reindexing task, we were wondering why the cluster is not realy using the hardware. Only some vcores on the servers were running with around 10%, the SSD I/O were between 0 and 20% and the Lan interconnect has been used minimally. Finally it took 3 hours for the task.

Can sombody tell us, why the hardware is not really used while reindexing?

Thank You!

s1monw · January 19, 2018, 1:17pm

did you look into slicing for you reindex task?

seb_ma · January 19, 2018, 4:28pm

Hi Simon,

yes, we tryed the slicing. But after finishing the reindexing, the new index has less documents then the original one. So, we were loosing data and didn't try slicing again.

We set the vm.max_map_count to 262144 as recommended. Is it maybe to small for 4 docker machines?

s1monw · January 22, 2018, 9:23am

wait, what? I mean that sounds like a terrible bug to me. What version of elasticsearch are you using and do you keep on writing to the index you are reindexing from or do you reindex into the same index? /cc @jimczi @nik9000

seb_ma · January 22, 2018, 11:32am

We are using your offizial Docker Image for 6.1.1.
The index was "finished". We don't keep writing to the index any more. We also created a new index in which we were reindexing.

jimczi · January 22, 2018, 1:46pm

Can you share the complete request that you used to reindex with slice ?
Did you use automatic slicing:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#docs-reindex-automatic-slice
or manual:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html#docs-reindex-manual-slice
?
Did you check for failures in the response of the reindex task ? Slicing is the preferred way to parallelize a single reindex task so it shouldn't miss any document. Are you able to retry the operation and record the response (the task response) if not already done ?

seb_ma · February 5, 2018, 12:54pm

Indices created by: 

PUT index2
{
  "settings": {
    "number_of_replicas": 0,
    "number_of_shards": 20,
    "refresh_interval" : "300s"
  }
}
PUT index3
{
  "settings": {
    "number_of_replicas": 0,
    "number_of_shards": 20,
    "refresh_interval" : "300s"
  }
}

Reindex with slice:

POST _reindex
{
  "source": {
    "index": "index1",
    "slice": {
      "id": 0,
      "max": 2
    }
  },
  "dest": {
    "index": "index2"
  }
}
POST _reindex
{
  "source": {
    "index": "index1",
    "slice": {
      "id": 1,
      "max": 2
    }
  },
  "dest": {
    "index": "index2"
  }
}
POST _reindex
{
  "source": {
    "index": "index1",
    "slice": {
      "id": 0,
      "max": 2
    }
  },
  "dest": {
    "index": "index3"
  }
}
POST _reindex
{
  "source": {
    "index": "index1",
    "slice": {
      "id": 1,
      "max": 2
    }
  },
  "dest": {
    "index": "index3"
  }
}

We used this commands for creating the Index and manual slicing. We were not able to reproduce the problem again. we tryed it with autmatic slicing and it worked fine.

system · March 5, 2018, 12:55pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Reindex API performance Elasticsearch	3	4494	July 5, 2017
Reindex and Garbage Collection Elasticsearch	6	486	August 23, 2018
Improve reindex speed into new cluster Elasticsearch	4	1090	January 5, 2019
Running 2x 4 Node Cluster Using Docker, Advice with Hardware and Approach Elasticsearch docker	1	258	May 2, 2022
Reindexing to a single node cluster Elasticsearch	7	726	February 27, 2019

Elastic uses hardware minimally while reindexing

Related topics