Number of documents in an index


(Herbert) #1

Hi,

I'm doing some load testing and I found something strange on ElasticSearch (2.2.0).

I feed a single index with 104.690.000 documents. This index has only 2 shards.

When I check the output of /_cat/indices, I get this:

green open customer-data-100m-v1 2 1 104690000 0 171.2gb 88.3gb

But when I do a GET /customer-data-100m-v1/customer/_count, I get half number of documents:

{ "count": 52345000, "_shards": { "total": 2, "successful": 2, "failed": 0 } }

Is this somehow related to the number of shards? Shouldn't I get the same count?


(Evan Volgas) #2

What happens if you GET /customer-data-100m-v1/_count instead of GET /customer-data-100m-v1/customer/_count


(Herbert) #3

I get the same output.


(Herbert) #4

So, it looks like it's not related to the number of shards, because I loaded another index with +100mi docs, but now with 4 shards, and I also get half count.

green open customer-data-100m-4shards-v1 4 1 107180000 0 171gb 85.5gb

{ "count": 53590000, "_shards": { "total": 4, "successful": 4, "failed": 0 } }


(Anh) #5

How about

GET /customer-data-100m-v1/_stats

(Herbert) #6

I deleted a few docs to try something else, so here are the updated results, including the part of the output of /_stats

green open customer-data-100m-v1         2 1 104677426 12574 170.9gb    88gb 
{
   "count": 52338713,
   "_shards": {
      "total": 2,
      "successful": 2,
      "failed": 0
   }
}

The the output of /customer-data-100m-v1/_stats is at http://pastebin.com/raw/YpZn06jK


(Anh) #7

_stats gave you the number you expected

"indices": {
      "customer-data-100m-v1": {
         "primaries": {
            "docs": {
               "count": 104677426,
               "deleted": 12574
            },

but it's strange that _count did not give you the same number. I checked many of my indices, and all gave the same number for both _count and _stats


(Herbert) #8

Yes, but what is count then?

{
   "count": 52338713,

(Anh) #9

You previous _count showed a total of two shards while the _stats showed 4 shards. Could you try

GET _cat/count/customer-data-100m-v1

(Mark Walkom) #10

At an educated guess, _cat include replica documents while _stats doesn't.


(Herbert) #11

Hi,

here:

GET /_cat/count/customer-data-100m-v1

1458576068 17:01:08 52338713

(system) #12