How can I save the aggregated results to another index?


(Mr.xian) #1

Now, I can save the results of the query to another index,like this:

POST _reindex
{
  "source": {
    "index": "twitter",
    "query":{
        "term":{"author.keyword":"Alex"} 
    }
  },
  "dest": {
    "index": "new_twitter"
  }
}

The above DSL are like those in SQL

insert into new_twitter select * from twitter where author='Alex'

But ,how do I save the aggregated results to another index?
Like this SQL statement down here

insert into new_twitter select author,count(1) cnt from twitter group by author

(David Pilato) #2

I don't think you can with existing API.
You should do that "manually" by yourself.

Note that the new Rollup API does similar things but I think it's limited today to time based data.


(Mr.xian) #4

Well, I looked hard at the rollup document and thought it would not solve my problem, because this method is actually compression of time.So, I figured out if I could use scroll to process my data manually, which unfortunately doesn't work, because scroll can only scroll the result from query, but it can't scroll the result from aggregation.Now, I don't know what to do :frowning:


(David Pilato) #5

Why would you need to scroll the agg?

Can't you just take the response and extract from it the agg part and store it as a document?


(Mr.xian) #6

Because the results returned by agg are large, unless I set a very large size for agg, but this can be very memory consuming.If I set a small size and can't get all the agg results, for example, the following is my code so that only 100 or a limited agg can be returned

TermsAggregationBuilder cityAggs=AggregationBuilders.terms("cityAggs").field("cityName.keyword").size(100);
SearchResponse scrollResp = esClient.prepareSearch("community")
	.setScroll(new TimeValue(60000))
	.setQuery(query)
	.addAggregation(cityAggs)
	.setSize(300)
	.get();
AggregationBuilders.terms("cityAggs").field("cityName.keyword").size(100);

In this case, setSize=100, which is limited to returning only 100 aggregate results, in fact, I've got a lot of aggregate results, like a million, if I setSize=1,000,000, maybe I don't have enough memory, I guess

I'm really sorry, maybe I don't understand ES well enough, and I don't know whether I can clearly express my question :frowning:


(Mr.xian) #7
If the request specifies aggregations, only the initial search response will contain the aggregations results.

This is what I find in a document, reference https://www.elastic.co/guide/en/elasticsearch/reference/6.4/search-request-scroll.html
As described in the document, I should be able to scroll only the result of query, not agg


(David Pilato) #8

May be this can help: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#search-aggregations-bucket-terms-aggregation-size

If you want to retrieve all terms or all combinations of terms in a nested terms aggregation you should use the Composite aggregation which allows to paginate over all possible terms rather than setting a size greater than the cardinality of the field in the terms aggregation. The terms aggregation is meant to return the top terms and does not allow pagination.

But:

I'm really sorry, maybe I don't understand ES well enough, and I don't know whether I can clearly express my question

May be you should express why do you want to "export" this to another index?


(Mr.xian) #9

Thank you very much, Composite Aggregation solves my problem, and I can use the "after" parameter to page the result of the Aggregation, so that I can manually save all the aggregated results to another index.

This aggregation provides a way to stream all buckets of a specific aggregation similarly to what scroll does for documents.

I'm doing this because of our needs.

We collected buildings data from multiple websites, and the same building may appear on multiple websites. Now, I want to aggregate the buildings, just like SQL

select buildingName from collected_buildings group by buildingName.

Then, I need to save these aggregated buildings. Just like SQL

insert into buildings select buildingName from collected_buildings group by buildingName

Then I can use the index "buildings" to do a "non-repetitive" search of a building.


(David Pilato) #10

I can't comment more but often a way to solve problems in elasticsearch is to solve the problem at index time and not at search time.


(system) #11

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.