Documents with child receiving higher score than documents without


(Jasper Quirynen) #1

Hi,

Something odd occured and I'd like to find the reason behind it since I can't find the documentation of it.

The mapping:

PUT vp-test
{
	"mappings": {
		"_default_": {
			"_all": {
				"enabled": false
			}
		},
		"fiche": {},
		"bijlage": {
			"_parent": {
				"type": "fiche"
			},
			"_routing": {
				"required": true
			}
		}
	}
}

Three documents of which one has a child:

PUT vp-test/fiche/11 
{
  "text": "parent"
}

PUT vp-test/fiche/12
{
  "text": "parent"
}

PUT vp-test/fiche/13
{
  "text": "parent"
}

PUT vp-test/bijlage/13?parent=11 
{
  "text": "child"
}

The query:

GET vp-test/fiche/_search
{
  "query": {
    "match": {
      "text": "parent"
    }
  }
}

The response:

{
	"hits": [{
			"_index": "vp-test",
			"_type": "fiche",
			"_id": "11",
			"_score": 0.6931472,
			"_source": {
				"text": "parent"
			}
		},
		{
			"_index": "vp-test",
			"_type": "fiche",
			"_id": "12",
			"_score": 0.2876821,
			"_source": {
				"text": "parent"
			}
		},
		{
			"_index": "vp-test",
			"_type": "fiche",
			"_id": "13",
			"_score": 0.2876821,
			"_source": {
				"text": "parent"
			}
		}
	]
}

As you can see, the document with ID 11's score is way higher without scoring on its children.
Anyone knows why and how to prevent it? Thanks!


(Colin Goodheart-Smithe) #2

If I had to guess I would say its because the three parent documents are on different shards. Or at least document 11 is on a different shard to the other two so the statistics it gathers for the score are different from the other two (due to the child documents being present on the shard) which will affect inverse document frequency since on the shard with the children the term parent will be considered rarer than the shard(s) where every document contains parent. Two things you can do to further debug this:

  1. Repeat the test but set the index to only have 1 shard
  2. put "explain": true in your search request body to see an explanation of how the score was calculated for each hit.

(Jasper Quirynen) #3

Hi Colin,

you're 100% correct, I did not consider that doc counts on shards would alter the score of a document.
"explain": true made the issue visible indeed.

Thanks!


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.