How can I save the aggregated results to another index?

fookfook · November 12, 2018, 7:30am

Now, I can save the results of the query to another index,like this:

POST _reindex
{
  "source": {
    "index": "twitter",
    "query":{
        "term":{"author.keyword":"Alex"} 
    }
  },
  "dest": {
    "index": "new_twitter"
  }
}

The above DSL are like those in SQL

insert into new_twitter select * from twitter where author='Alex'

But ,how do I save the aggregated results to another index?
Like this SQL statement down here

insert into new_twitter select author,count(1) cnt from twitter group by author

dadoonet · November 12, 2018, 8:45am

I don't think you can with existing API.
You should do that "manually" by yourself.

Note that the new Rollup API does similar things but I think it's limited today to time based data.

fookfook · November 13, 2018, 12:37am

Well, I looked hard at the rollup document and thought it would not solve my problem, because this method is actually compression of time.So, I figured out if I could use scroll to process my data manually, which unfortunately doesn't work, because scroll can only scroll the result from query, but it can't scroll the result from aggregation.Now, I don't know what to do

dadoonet · November 13, 2018, 1:11am

Why would you need to scroll the agg?

Can't you just take the response and extract from it the agg part and store it as a document?

fookfook · November 13, 2018, 1:21am

Because the results returned by agg are large, unless I set a very large size for agg, but this can be very memory consuming.If I set a small size and can't get all the agg results, for example, the following is my code so that only 100 or a limited agg can be returned

TermsAggregationBuilder cityAggs=AggregationBuilders.terms("cityAggs").field("cityName.keyword").size(100);
SearchResponse scrollResp = esClient.prepareSearch("community")
	.setScroll(new TimeValue(60000))
	.setQuery(query)
	.addAggregation(cityAggs)
	.setSize(300)
	.get();

AggregationBuilders.terms("cityAggs").field("cityName.keyword").size(100);

In this case, setSize=100, which is limited to returning only 100 aggregate results, in fact, I've got a lot of aggregate results, like a million, if I setSize=1,000,000, maybe I don't have enough memory, I guess

I'm really sorry, maybe I don't understand ES well enough, and I don't know whether I can clearly express my question

fookfook · November 13, 2018, 6:28am

If the request specifies aggregations, only the initial search response will contain the aggregations results.

This is what I find in a document, reference https://www.elastic.co/guide/en/elasticsearch/reference/6.4/search-request-scroll.html
As described in the document, I should be able to scroll only the result of query, not agg

dadoonet · November 13, 2018, 11:02am

May be this can help: Terms aggregation | Elasticsearch Guide [8.11] | Elastic

If you want to retrieve all terms or all combinations of terms in a nested terms aggregation you should use the Composite aggregation which allows to paginate over all possible terms rather than setting a size greater than the cardinality of the field in the terms aggregation. The terms aggregation is meant to return the top terms and does not allow pagination.

But:

I'm really sorry, maybe I don't understand ES well enough, and I don't know whether I can clearly express my question

May be you should express why do you want to "export" this to another index?

fookfook · November 13, 2018, 2:08pm

Thank you very much, Composite Aggregation solves my problem, and I can use the "after" parameter to page the result of the Aggregation, so that I can manually save all the aggregated results to another index.

This aggregation provides a way to stream all buckets of a specific aggregation similarly to what scroll does for documents.

I'm doing this because of our needs.

We collected buildings data from multiple websites, and the same building may appear on multiple websites. Now, I want to aggregate the buildings, just like SQL

select buildingName from collected_buildings group by buildingName.

Then, I need to save these aggregated buildings. Just like SQL

insert into buildings select buildingName from collected_buildings group by buildingName

Then I can use the index "buildings" to do a "non-repetitive" search of a building.

dadoonet · November 13, 2018, 4:11pm

I can't comment more but often a way to solve problems in elasticsearch is to solve the problem at index time and not at search time.

system · December 11, 2018, 4:11pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can i reindex aggregation's result to other index? Elasticsearch	3	1565	February 27, 2019
How can I save this aggregated results under a new index Elasticsearch	6	182	April 15, 2024
Save results from aggregation to new index? Elasticsearch	2	2249	June 21, 2017
Save aggr query and use it again Elasticsearch	4	223	November 23, 2021
Save results of aggregation to new index? Elasticsearch	2	1000	April 13, 2018

How can I save the aggregated results to another index?

Related topics