Getting list of terms ranked by their document frequency?

I'm using Elasticsearch to index images using the bag of visual words approach (the words are stored in a "visual_words" field in image documents). Unfortunately, it seems like the inverted index isn't helping speed up my searches much. In my test, I have indexed 25000 images and a typical query returns >24000 results. I'm guessing this is because a few of my dictionary words are present in almost all documents. Is there anything else that would cause Elasticsearch to return >24000 results out of 25000? Also, is there a way for me to get a list of terms ordered by their document frequency so I see if my hypothesis is right and if this is the case for only a few words or lots of them.

Thanks!

So I did some math and the issue really is the size of my dictionary:

Here's my math (in python): https://gist.github.com/0c8589a67fcaaa32e4ccdbde1bfd6d3d

It might be wrong but seems to match the result I'm getting experimentally.