Well, I looked hard at the rollup document and thought it would not solve my problem, because this method is actually compression of time.So, I figured out if I could use scroll to process my data manually, which unfortunately doesn't work, because scroll can only scroll the result from query, but it can't scroll the result from aggregation.Now, I don't know what to do
Because the results returned by agg are large, unless I set a very large size for agg, but this can be very memory consuming.If I set a small size and can't get all the agg results, for example, the following is my code so that only 100 or a limited agg can be returned
In this case, setSize=100, which is limited to returning only 100 aggregate results, in fact, I've got a lot of aggregate results, like a million, if I setSize=1,000,000, maybe I don't have enough memory, I guess
I'm really sorry, maybe I don't understand ES well enough, and I don't know whether I can clearly express my question
If you want to retrieve all terms or all combinations of terms in a nested terms aggregation you should use the Composite aggregation which allows to paginate over all possible terms rather than setting a size greater than the cardinality of the field in the terms aggregation. The terms aggregation is meant to return the top terms and does not allow pagination.
But:
I'm really sorry, maybe I don't understand ES well enough, and I don't know whether I can clearly express my question
May be you should express why do you want to "export" this to another index?
Thank you very much, Composite Aggregation solves my problem, and I can use the "after" parameter to page the result of the Aggregation, so that I can manually save all the aggregated results to another index.
This aggregation provides a way to stream all buckets of a specific aggregation similarly to what scroll does for documents.
I'm doing this because of our needs.
We collected buildings data from multiple websites, and the same building may appear on multiple websites. Now, I want to aggregate the buildings, just like SQL
select buildingName from collected_buildings group by buildingName.
Then, I need to save these aggregated buildings. Just like SQL
insert into buildings select buildingName from collected_buildings group by buildingName
Then I can use the index "buildings" to do a "non-repetitive" search of a building.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.