Scroll in ElasticSearch Aggregation

I have a index with around 50Million data points, where I have a ID for each document.

I need unique id and its count , as it would be more than 10000, I used scrolling but unexpecteded, the scrolling gives me the same scrollid in the iteration

> data = es.search(index="ttd-conversions-2019-05*", scroll='1m', body= bodys) 
> sid = data['_scroll_id']
> scroll_size = len(data['hits']['hits'])
> count = list()
> tdid = list()
> while(scroll_size > 0):
>     log_time = list()
>     tdid = list()
>     print('Scroll id', sid)
>     page = es.scroll(scroll_id= sid, scroll = '1m')
>     sid = page['_scroll_id']
> 
>     count = list()
>     tdid = list()
>     for i in data['aggregations']['2']['buckets']:
>         count.append(i['doc_count'])
>         tdid.append(i['key'])
> 
>     scroll_size = len(page['hits']['hits'])
>     with open(save_path + "/out.csv", "a", newline="") as f:
>         writer = csv.writer(f)
>         writer.writerows(zip(count, tdid))

Please let me know, this same code works fine for search with scroll, but aggregation repeats the same scroll id.

Thanks for your help.

The scroll api is used to scroll through documents, not aggregations.

To look at options for large numbers of unique terms try run this wizard to pick the right approach

1 Like

Hi, Thanks for your respone. I understand that scroll id is to scroll through the documents, so i'm expecting that in scroll_id would change for the next iterations, but it doesn't and always gives the first 10 results. I'm confused in it.

Aggregations summarize the entire result set, not the current page of documents.
If you want to page through aggregation results you need to see my previous advice.

Thank you, Based on Wizard, which suggested the Composite Aggregation

GET ttd-conversions-2019-05*/_search?scroll=1m
{
  "aggs" : {
      "my_buckets": {
            "composite" : {
                "sources" : [
                    { "TDID": { "terms" : { "field": "TDID"}                       
                    }}]}}}
}

Well, it returns the result with TDID and Count only for the first scroll, I request based on scroll_id on next. It doesn't return the TDID with Count but just lists the same documents. I'm bit confused. Sorry if I misunderstood something. Thanks for your support, I means a lot

Don't use scroll.
Use a regular search using the composite agg and then another search using a composite agg with the after parameter returned in the previous result. Repeat as necessary

Thanks for your time and effort. I would be helpful, if you could share your insights for the following problem Count in other index based on Current Index field

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.