Fetch large amount of documents - ideas

Georgi_Ivanov · May 7, 2019, 1:13pm

Hi,
I have a case where i need to fetch (very) large number of documents.
Example:
i have a list of 15000 entity id's that i need to export data for.
My docs have entity_id field.

What i do so far, is partitioning this input list of Id's and then for every partition, use terms and time range query to fetch the data using scroll. My partition size is 100.

Is there a better/faster way of doing that ?

Any ideas ?

DavidTurner · May 8, 2019, 7:33am

What you describe sounds like a reasonable way to do this.

Does this mean you're only retrieving 100 documents in each batch? I would expect a larger batch size to be faster if so. You'll need to experiment to find the best value for your system.

Also if you're only retrieving 100 documents each time then you don't need to scroll.

Georgi_Ivanov · May 8, 2019, 7:52am

No,
It means that in one batch i am fetching all the data for 100 entities (out of 15000) which can be hundreds of thousands document.
So I do need scroll.

In SQL world this would be for each partition :
select * from data where entity_id in (1,2,3,...100) and timestamp between xxx and yyy

DavidTurner · May 8, 2019, 7:54am

Ok, got it. In that case, yes, a scroll seems like a good idea.

system · June 5, 2019, 7:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Retrieving millions of large documents Elasticsearch	7	502	September 25, 2023
How to fetch ~12M documents(may be even more) quickly from ES using scroll API? Elasticsearch	4	904	December 28, 2017
How to script export of > 10,000 records - 5 mil? Elasticsearch	9	6488	July 5, 2017
How to get data more than 10000 in elasticsearch Elasticsearch	27	21557	January 17, 2018
Get all documents from an index Elasticsearch	10	109901	June 21, 2017

Fetch large amount of documents - ideas

Related topics