Why update_by_query/delete_by_query are taking almost same time with Automatic Slicing & Without Slicing?

Hi Team,

I have gone through the elastic search docx and tried to implement the slicing concept in 2-ways for update_by_query/delete_by_query.

  1. Manual Slicing (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html)
  2. Automatic Slicing (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html)

Index & Cluster Details:

Index Name: carsspark_original (for security reason i am not exposing actual index name)
No.of Shards = 5 (hence i have taken slices also 5)
Its a 2-Node Cluster

Here is the code and statistics that i have observed.

Update_By_Query:

  1. Manual Slicing:

POST /carsspark_original/_update_by_query
{
"slice": {
"id": 4,
"max": 5
},

"query": {
"match": {
"is_active": "Y"
}
},

"script": {
"source": "ctx._source['is_active'] = 'N'",
"lang": "painless"
}
}

Results:

Total count = 2534885
SliceId-0 Total Records = 507278 & Total Time took to update 68171 ms = 1.1 min

SliceId-0 Total Records = 507278 & Total Time took to update 192682 ms = 3.2 min
SliceId-1 Total Records = 506708 & Total Time took to update 213633 ms = 3.5 min
SliceId-2 Total Records = 506649 & Total Time took to update 209817 ms = 3.4 min
SliceId-3 Total Records = 506800 & Total Time took to update 210220 ms = 3.5 min
SliceId-4 Total Records = 507450 & Total Time took to update 213074 ms = 3.5 min

  1. Automatic Slicing:

POST carsspark_original/_update_by_query?refresh&slices=5
{
"query": {
"match": {
"is_active": "Y"
}
},

"script": {
"source": "ctx._source['is_active'] = 'N'",
"lang": "painless"
}
}

Total Records = 2534885 & Total Time took to update 466855 ms = 7.7 min

Delete_By_Query:
Without Slicing:

POST /carsspark_original/_delete_by_query
{
"query": {
"bool": {
"must": [
{
"match": {
"is_active" : "N"
}
}
]
}
}
}

Records = 2534885 = 135866 ms = 2.6 min

With Slicing:

POST /carsspark_original/_delete_by_query?refresh&slices=5
{
"query": {
"bool": {
"must": [
{
"match": {
"is_active" : "N"
}
}
]
}
}
}

Records = 2534885 = 159461 ms = 2.6 min

Here is the glance report of the statistics for both update_by_query/delete_by_query :
Note:

I have taken same number of records in all the runs updtae/delete (records count=2534885) to compare the timings perfectly and without any confusion with records count.

Manual Slicing Update by query:
Total Records = 2534885
SliceId-0 Total Records = 507278 & Total Time took to update 192682 ms = 3.2 min
SliceId-1 Total Records = 506708 & Total Time took to update 213633 ms = 3.5 min
SliceId-2 Total Records = 506649 & Total Time took to update 209817 ms = 3.4 min
SliceId-3 Total Records = 506800 & Total Time took to update 210220 ms = 3.5 min
SliceId-4 Total Records = 507450 & Total Time took to update 213074 ms = 3.5 min

Without Slicing Update by query:
Total Records = 2534885 & Total Time took to update 541892 ms = 9.3 min

Automatic Slicing Update by query:
Total Records = 2534885 & Total Time took to update 511886 ms = 8.5 min

Without Slicing Delete by query:
Total Records = 2534885 & Total Time took to delete 135866 ms = 2.26 min

With Slicing Delete by query:
Total Records = 2534885 & Total Time took to delete 159461 ms = 2.65 min

If we see the report, we can observe in both update_by_query/delete_by_query with & without slicing it is taking approx same time. But could see in the kibana automatic scling is happening but time is not reducing and it is taking approx same time with & without slicing for update_by_query/delete_by_query.

And is automatic slicing is doing paralle work or is it doing one request after another request and consolidating all the request results and showing in the Kibana? Because Ideally sclicing should reduce the time but i see with & without slicing timings almost same.

Could you please let me know why even with slicing (automatic slice) also update_by_query/delete_by_query is taking the same time. And suggest me if any changes if need to do to make it work as parallel processing to reduce the time consumption.

Note:

Even for ReIndexing also it is taking approx same time with (automatic scicing) & without slicing.
Total Records = 2545397 & Time taken todo ReIndexing 448269 ms = 7.4 min [Without Slicing]
Total Records = 2545397 & Time taken todo ReIndexing 394737 ms = 6.5 min [With Automatic Slicing]

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.