Average value change when filtering or changing size

This sounds like the effect of merging the results from independent shards into a single response, which is generally considered an approximation and with smaller result sizes (when compared to the cardinality) can sometimes result in inaccurate results. Check out https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_shard_size_3

I suggest using the shard_size parameter in your terms aggregation to ensure that it pulls back enough terms to get an accurate enough result for your use case. The correct number is really based on the cardinality of your field, and the way it is sharded, but it seems you have found a good number to start with.