Fetch 1 million records

otherview · February 24, 2020, 11:23am

Hi !

I'm trying to fetch +1 Million records from elastic search.
So far all approaches are still slow.

At the moment I'm using a sliced scroll search.

I just need a few fields from the documents, not the whole document. Is there a way I can optimize this ?

Thanks!

rugenl · February 25, 2020, 4:15am

I think this describes what you can try, but read the warnings about text fields.

This looks like a similar question

Let us know how it works out

Bernt_Rostad · February 25, 2020, 5:44am

You can use source filtering to include or exclude fields from the result.

For instance, let's say I have an index my_published_stories and want to loop over all but just fetch the publishing date (pubdate) and processing time, then I'd do something like this:

GET my_published_stories/_search
{
  "_source": ["pubdate", "processing_time"]
}

Hope this answered your question.

otherview · February 25, 2020, 3:47pm

Hi @Bernt_Rostad

Yep, I'm already doing that Thanks!

otherview · February 25, 2020, 3:49pm

Thanks for that @rugenl !

I'm going to give the doc values a go! I'll post the results when I have some!

otherview · March 2, 2020, 11:32am

hi @rugenl!

I've started using Sliced Scroll using doc_values and no major improvements.. Any other ideas ?

Thanks!

rugenl · March 2, 2020, 4:33pm

Ok, what language are you using?

I put some timing in my scripts to measure what time I was waiting on the scroll vs. when I was processing the data. The waiting time includes network time, but it let me know whether to improve my search or my process.

Powershell was MUCH slower than the Python Elasticsearch DSL, like orders of magnitude slower.

system · March 30, 2020, 4:48pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Retrieving over a million records in Elasticsearch Elasticsearch	10	29013	July 5, 2017
Returning some fields without _source, is it possible? Elasticsearch	2	505	July 6, 2017
Fetching both _source and a specific field in scan search Elasticsearch	4	373	July 6, 2017
API Requesting all documents in a certain range Elasticsearch	5	597	February 10, 2018
Is it possible to get query results from document values? Elasticsearch	3	413	July 6, 2017

Fetch 1 million records

Related topics