Scroll time increment effect on Elastic Search

AMIT_KEWAL · September 27, 2018, 10:51am

I am working on a project using ElasticSearch and querying it to fetch the member information. It has ~30 Lakhs records.

Basically, I am running a campaign for 20L users and the user data is present on elasticsearch6.2. I query the ES and fetches the records in batches(50 records at a time) using the scroll. Also, I want to keep the SEARCH context for 1 day because if the campaign running process fails due to any reason, I can resume the campaign from where it was stopped. In this way, I will escape from starting the campaign again from starting. I am also saving the scrollID and will use it to resume campaign.

While testing I found CPU Utilization increased by 50% (ES config: 2 nodes with 4 shards running on aws, Instance Type:i3.xlarge.elasticsearch) and its CPU Utilization remains consistent to 50%.

Is there any relation between CPU Utilization and keeping the search context for 1day. BTW campaigns take 6 hours to finish.

Bernt_Rostad · September 27, 2018, 11:13am

I've never used a Scroll with more than 1 minute timeout and there is probably a price to pay for keeping the search context open for much longer. The official Scroll documentations warns that:

an open search context prevents the old segments from being deleted while they are still in use. [...] Keeping older segments alive means that more file handles are needed.

But I'm not sure if this explains the increased CPU you're seeing.

An alternative to using Scroll is the light weight Search After mechanism, which is very useful if you can order your records on a unique field - for instance a sequence number or a date timestamp. With this mechanism you're not keeping an expensive open state in Elasticsearch, because each search request knows where to start (after). Hence you can perform one search today and the next in a week, with no extra cost.

system · October 25, 2018, 11:13am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Scroll analysis at node stats level Elasticsearch	3	1051	June 9, 2020
Should I disable scroll time if I don't explicitly use scroll in any search or index operation? Elasticsearch	2	325	September 18, 2023
Optimised Keep Alive Time for Scroll API Elasticsearch	5	1356	May 7, 2020
Scroll api：How to understand “Keeping the search context alive”？ Elasticsearch	5	825	August 30, 2018
Pagination and real time indexing Elasticsearch	2	826	March 2, 2017

Scroll time increment effect on Elastic Search

Related topics