Get all 3 million ids of a type very quickly

i want to get all ids of a type from elasticsearch via http
i have approx. 3 million ids

i am using scroll api
scroll=1m
size=10000
{
"query": {
"match_all": {
}
},
"_source": false,
"sort": ["_doc"]
}

its fast in the beginning but gets very slow (4 minutes and more per scroll) later on. even with a higher scroll=5m time

increasing size:
size=100000

  • changing result window
    doesnt change it. its fast at the start and gets slow later on

what can i do?

why do i want to get all ids:
i have a RDBMS with columns for lastmodified
some import deletes records in my RDMBS
i periodically query my RDBMS and update elasticsearch
for deleting i get all ids from RBDMS
all ids from elasticsearch
and delete from elasticsearch all documents by id which are not (or no longer) in RDBMS

Why not keep track of deleted documents in the RDBMS, e.g. through a trigger, so that you can manage deletes based on this instead of having to compare all IDs, which will scale badly?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.