More terms but lower score

Hello,
I've create 3 document with multi values field:
{ "query": {"match_all": {}}}
Result is here: https://gist.github.com/strokine/f336d598a0781070154f

Now I want to search by some of those terms, and I expect that more times the term repeated in the document, the score will be higher, like here for example:
{
"query": {"term": {
"tags": {
"value": "ww"
}
}}
}

Result is here: https://gist.github.com/strokine/4e3aca9980076016166f
1st and 2nd documents have the same number of "ww" tags, so they have the same score.

But here:
{
"query": {"term": {
"tags": {
"value": "qq"
}
}}
}

Result: https://gist.github.com/strokine/7665fbaa53bff3b1dcbf

Shows the document which has 2 "qq" tags the last, with the lowest score.

It looks like the total number of tags affected the result dramatically.

First of all, is my assumption right, that more same tags a document has, higher the number will be (I guess with equal total tag number between documents)?
And if so, how could I prevent total number of tags affect the score that much?

Thanks,
Eugene

Unfortunately It's not quite that simple. What you're talking about roughly is what's known as "term frequency." Documents with more of your term should get a higher relevance score. But the default relevance scoring also includes factors like the inverse document frequency, basically how rare a term is: rarer terms get high scores when they match. And also fieldnorms, which bias scoring towards shorter documents. There's also query normalization, which biases this query based on how relatively rare the terms are (ie based on IDF).

You can read more about the math involved here we also cover this in really extensive detail in chapter 3 of our book Relevant Search