I have an index that has C. 231M docs in it, based on the response of the _count api this is around the number I expect.
When I look at the index in marvel it says I have 3.7B docs on the index X16 times more than I should have.
I can see the output from the _stats api that the count is C3.7B also
So what is the difference between these two counts as I have other index's where these numbers match?
I am also using type nested in the mapping for the doc, so not sure if that is storing it as multiple docs?
Yes, nested objects are stored as multiple documents in your index. The _stats api returns the number of individual Lucene documents in the index which will include all the nested documents as well as the root documents. See this section of the Definitive Guide book for more information on how nested documents work: https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html
Thanks for clarifying that. So is it more expensive to load/query the doc this way? i.e. should I strip out the nested objects? As I am getting OOM errors when doing bulk data loads, not sure if its related to this problem or not