Wrong doc_count in term_vectors?

doc_count is incremented when adding the same document several times.

To reproduce execute twice:

PUT /test_index/_doc/1
{
  "somefield": "somedata"
}

There is 1 document in the index but doc_count=2:

{
  "_index" : "test_index",
  "_type" : "_doc",
  "_id" : "1",
  "_version" : 2,
  "found" : true,
  "took" : 0,
  "term_vectors" : {
    "somefield" : {
      "field_statistics" : {
        "sum_doc_freq" : 2,
        "doc_count" : 2,
        "sum_ttf" : 2
      },
      "terms" : {
        "somedata" : {
          "term_freq" : 1,
          "tokens" : [
            {
              "position" : 0,
              "start_offset" : 0,
              "end_offset" : 8
            }
          ]
        }
      }
    }
  }
}

Is that expected?
If for instance I compute TFIDF on terms for a document inserted several times (whatever the reason), this will be wrong.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.