Marvel doc count higher than count api


(Pat Humphreys) #1

I have an index that has C. 231M docs in it, based on the response of the _count api this is around the number I expect.

When I look at the index in marvel it says I have 3.7B docs on the index X16 times more than I should have.

I can see the output from the _stats api that the count is C3.7B also
{"primaries":{"docs":{"count":3706439463,"deleted":454714116}

So what is the difference between these two counts as I have other index's where these numbers match?

I am also using type nested in the mapping for the doc, so not sure if that is storing it as multiple docs?


(Colin Goodheart-Smithe) #2

Yes, nested objects are stored as multiple documents in your index. The _stats api returns the number of individual Lucene documents in the index which will include all the nested documents as well as the root documents. See this section of the Definitive Guide book for more information on how nested documents work: https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html


(Christoph) #3

Also for future reference a similar discussion here: Why does _stats doc count differ from _search/_count doc count for an index?


(Pat Humphreys) #4

Thanks for clarifying that. So is it more expensive to load/query the doc this way? i.e. should I strip out the nested objects? As I am getting OOM errors when doing bulk data loads, not sure if its related to this problem or not


(system) #5