I found when doing aggregation, some times it creates millions of empty buckets. Thus the service collapse because out of memory error.
Obviously, it's the users problem. I'm now working on finding out the internal reason for this problem.
But i suggest that elastic should check this problem. i.e. when emptying buckets is more than 1 million, stop the aggregation, and throw an exception.
This problem broke down my cluster more than five times these days. The same aggregation works well in the past several months. I think it should be some special data in the index. But now i didn't find what's the data.