Elasticsearch scan of all IDs slows down dramatically (exponentially?)

Timothy_Wall · July 21, 2020, 5:01pm

I'm trying to dump all IDs from an index. The dump starts with an ETA of under ten minutes, but then progressively slows down, ultimately taking over an hour to finish.

Configuration: single node, two shards, no replica), running on an AWS r5.4xlarge, 30Gb heap, running in a docker container. ES 7.8. Using custom IDs. Docs range in size from a few 10s of bytes to 1-2Kb.

I would expect a total delivery time linear with the shard size, but it seems to get exponentially slower (I also tried this on a much bigger shard size (45Gb) and it looked like it would never finish).

Docker stats:

CONTAINER ID        NAME                CPU %               MEM USAGE / LIMIT   MEM %               NET I/O             BLOCK I/O           PIDS
56adac7cf174        es             184.92%             32.24GiB / 120GiB   26.87%              132GB / 46.3GB      83.3GB / 1.22TB     180

Indices status:

% curl localhost:9200/_cat/indices?v

health status index                  uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   index-name             nCRb7R1-QKmJwJEhqPLJtA   2   0   16753650          702     14.1gb         14.1gb

With this code:

from elasticsearch.helpers import scan
from tqdm import tqdm

def get_ids(index):
    body={'query': {'match_all':{}}}
    for doc in scan(ES_CLIENT, query=body, index=index, _source=False):
        yield doc['_id']

with open("es-ids.txt", "w") as fd:
  for id in tqdm(get_ids("index-name"), total=16753724):
    fx=fd.write(f"{id}\n")

36%|███     |  6042901/16753724 [07:33<22:11, 8043.51it/s]
70%|█████   | 11644101/16753724 [18:59<14:24, 5908.57it/s]
85%|██████  | 14171201/16753724 [29:18<16:15, 2647.56it/s]
90%|███████ | 15072301/16753724 [35:56<18:29, 1515.32it/s]

Timothy_Wall · August 1, 2020, 1:49pm

Note the rapidly decreasing iteration rate. It's like at any given interval it's taking the same amount of time to get halfway through the remaining docs.

Timothy_Wall · August 9, 2020, 9:57pm

Tried this again on a three-node cluster, using AWS elasticsearch service (each node an r5.large).

Similar results, slower on the overall spool time (although this might be due to 3 x r5.large versus a single r5.4xlarge). There is still an increasing lag as the process gets closer to the end.

Timothy_Wall · August 10, 2020, 7:59pm

No significant difference on a 2-node AWS ES cluster (2 x r4.2xlarge).

Bumping up the scan 'size' parameter to 5k from its default (500) seems to improve the overall throughput, with less of a slowdown, but the slowdown is still present.

system · September 7, 2020, 7:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Scan and scroll performance with IDs query Elasticsearch	6	3529	July 5, 2017
Elasticsearch Bulk Write is slow using Scan and Scroll Elasticsearch	4	940	July 5, 2017
Scan/Scroll performance degrading logarithmically Elasticsearch	4	1309	July 5, 2017
How to improve performance of scan method? Elasticsearch	3	2345	January 9, 2019
Scroll/Scan API, pockets of slow responses while scrolling Elasticsearch	5	1814	July 6, 2017

Elasticsearch scan of all IDs slows down dramatically (exponentially?)

Related topics