Unexpected scoring with nested objects

I have a system containing documents, each of which has a set of tags. The tags are stored as nested objects and I am querying for documents based on the tags. I would like documents that match more tags to be scored higher. This is mostly working fine, but I have seen some cases where the scoring is not as expected.

In this gist:nestedTestScore.sh I create an index, and insert four documents, each with a single tag, "aaaa", "bbbb", "cccc" and "dddd". I then query for documents with tags "aaaa" or "bbbb". What I see is that the first two documents are returned, but with quite different scores.

If I change the script and only insert the first two documents, then they do indeed get returned with the same score. I have tried this against ElasticSearch 2.1.1 and 2.3.3 with the same results.

Can anyone explain what is going on here?

Thanks

Chris

I don't really understand what is happening, but I have worked around the problem. What I really care about is the number of matching tags and so TF/IDF is irrelevant. Following the details here: Ignoring TF/IDF I wrapped each of the term queries in a constant_score query. This resulted in the scoring working as expected.