Query time boosting produces result in inadequate order

The ES search result for the given search keyword one two three seems to be wrong after applying boost feature per keyword. Please help me modifying my "faulty" query in order to accomplish "expected result" below as I described. I'm on ES 1.7.4 with LUCENE 4.10.4

Original question is here in stackoverflow.

Boosting criteria

word - boost
----   -----
one    1
two    2
three  3

ES index content - just showing MySQL dump to make the post shorter

mysql> SELECT id, title FROM post;
+----+-------------------+
| id | title             |
+----+-------------------+
|  1 | one               |
|  2 | two               |
|  3 | three             |
|  4 | one two           |
|  5 | one three         |
|  6 | one two three     |
|  7 | two three         |
|  8 | none              |
|  9 | one abc           |
| 10 | two abc           |
| 11 | three abc         |
| 12 | one two abc       |
| 13 | one two three abc |
| 14 | two three abc     |
+----+-------------------+
14 rows in set (0.00 sec)

Expected ES query result - The user is searching for one two three. I'm not fussed about the order of equally scored records. I mean if record 6 and 13 switches places, I don't mind.

+----+-------------------+
| id | title             | my scores for demonstration purposes
+----+-------------------+
|  6 | one two three     | (1+2+3 = 6)
| 13 | one two three abc | (1+2+3 = 6)
|  7 | two three         | (2+3 = 5)
| 14 | two three abc     | (2+3 = 5)
|  5 | one three         | (1+3 = 4)
|  4 | one two           | (1+2 = 3)
| 12 | one two abc       | (1+2 = 3)
|  3 | three             | (3 = 3)
| 11 | three abc         | (3 = 3)
|  2 | two               | (2 = 2)
| 10 | two abc           | (2 = 2)
|  1 | one               | (1 = 1)
|  9 | one abc           | (1 = 1)
|  8 | none              | <- This shouldn't appear
+----+-------------------+
14 rows in set (0.00 sec)

Unexpected ES query result - Unfortunately, This is what I get.

+----+-------------------+
| id | title             | _score
+----+-------------------+
|  6 | one two three     | 1.0013864
| 13 | one two three abc | 1.0013864
|  4 | one two           | 0.57794875
|  3 | three             | 0.5310148
|  7 | two three         | 0.50929534
|  5 | one three         | 0.503356
| 14 | two three abc     | 0.4074363
| 11 | three abc         | 0.36586377
| 12 | one two abc       | 0.30806428
| 10 | two abc           | 0.23231897
|  2 | two               | 0.12812772
|  1 | one               | 0.084527075
|  9 | one abc           | 0.07408653
+----+-------------------+

ES query

curl -XPOST "http://127.0.0.1:9200/_search?post_dev" -d'
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "title": {
            "query": "one two three"
          }
        }
      },
      "should": [
        {
          "match": {
            "title": {
              "query": "one",
              "boost": 1
            }
          }
        },
        {
          "match": {
            "title": {
              "query": "two",
              "boost": 2
            }
          }
        },
        {
          "match": {
            "title": {
              "query": "three",
              "boost": 3
            }
          }
        }
      ]
    }
  },
  "sort": [
    {
      "_score": {
        "order": "desc"
      }
    }
  ],
  "from": "0",
  "size": "100"
}'

Hi @BentCoder,

you give too little information to reproduce this but scoring does not only take your boost value into account. From the reference documentation:

The default relevance calculation takes field length into account, so a short[...] field will have a higher natural boost than a long [...] field.

You can use explain to get more details how the final score has been computed by Elasticsearch.

Suppose your original query is (I don't use your query but a much shorter one so you see the difference more easily):

GET /myindex/_search
{
    "query": {
        "match_all": {}
    }
}

then you can get an explanation of the score calculation by adding "explain": true in the body:

GET /myindex/_search
{
    "explain": true, 
    "query": {
        "match_all": {}
    }
}

Daniel

Hi @danielmitterdorfer ,

Thank you for the answer. Later on I'll attach explain details to provide more data (I don't have the test case with me at the moment).

Based on the length calculation I understand the order a bit more now but what I don't understand is why 7, 5 and 14 comes after 4 and 3 as show below.

+----+-------------------+
| id | title             | _score	- my points
+----+-------------------+
|  6 | one two three     | 1.0013864	- 6
| 13 | one two three abc | 1.0013864	- 6
|  4 | one two           | 0.57794875	- 3
|  3 | three             | 0.5310148	- 3
|  7 | two three         | 0.50929534	- 5
|  5 | one three         | 0.503356	- 4
| 14 | two three abc     | 0.4074363	- 5
| 11 | three abc         | 0.36586377	- 3
| 12 | one two abc       | 0.30806428	- 3
| 10 | two abc           | 0.23231897	- 2
|  2 | two               | 0.12812772	- 2
|  1 | one               | 0.084527075	- 1
|  9 | one abc           | 0.07408653	- 1
+----+-------------------+

Some more test queries:

  • This query doesn't produce any result.
  • This query doesn't order correctly as seem here.

Thank you very much

It has been solved now. Solution is in stackoverflow link above.