What does "docCount" mean in the Explain API?


(Sang Gon Lee) #1

This is a portion of the response that I got from sending an explain API:

"_explanation": {
	"value": 15.811229,
	"description": "sum of:",
	"details": [
		{
			"value": 1,
			"description": "ConstantScore(NormsFieldExistsQuery [field=body])",
			"details": []
		},
		{
			"value": 14.811229,
			"description": "weight(body:unli in 151924) [PerFieldSimilarity], result of:",
			"details": [
				{
					"value": 14.811229,
					"description": "score(doc=151924,freq=1.0 = termFreq=1.0\n), product of:",
					"details": [
						{
						        "value": 12.410724,
						        "description": "idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:",
								"details": [
									{
										"value": 1,
										"description": "docFreq",
										"details": []
									},
									{
										"value": 368128,
										"description": "docCount",
										"details": []
									}
								]
							},

The "docFreq" is explained in the documentation and that makes sense.
I don't understand what "docCount" means though. At first I thought it might be the number of matching documents for this query, but the number of hits returned is 16, nowhere near 370k. Also, looking at the formula this docCount contributes positively to the score, but it doesn't seem to make sense the number of matches gives boost to the score of this single document.
We're using v6.3.0.
Thanks in advance.


(Mayya Sharipova) #2

The docCount here is the total number of documents in the index.
A simplified version of the formula for idf(term) is totalNumberOfDocs/(number of documents containing term).
That is why it is called inverted document frequency. Rare terms that only few documents contain will have a higher value of idf, while popular terms that are contained in many documents, will have a lower value of idf.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.