hgfxng  
                (Herbert)
               
                 
              
                  
                    March 16, 2016,  2:13pm
                   
                   
              1 
               
             
            
              Hi,
I'm doing some load testing and I found something strange on ElasticSearch (2.2.0).
I feed a single index with 104.690.000 documents. This index has only 2 shards.
When I check the output of /_cat/indices, I get this:
green open customer-data-100m-v1 2 1 104690000 0 171.2gb 88.3gb
But when I do a GET /customer-data-100m-v1/customer/_count, I get half number of documents:
{ "count": 52345000, "_shards": { "total": 2, "successful": 2, "failed": 0 } }
Is this somehow related to the number of shards? Shouldn't I get the same count?
             
            
               
               
               
            
            
           
          
            
              
                evolgas  
                (Evan Volgas)
               
              
                  
                    March 16, 2016,  3:10pm
                   
                   
              2 
               
             
            
              What happens if you GET /customer-data-100m-v1/_count instead of  GET /customer-data-100m-v1/customer/_count
             
            
               
               
               
            
            
           
          
            
              
                hgfxng  
                (Herbert)
               
              
                  
                    March 16, 2016,  4:33pm
                   
                   
              4 
               
             
            
              So, it looks like it's not related to the number of shards, because I loaded another index with +100mi docs, but now with 4 shards, and I also get half count.
green open customer-data-100m-4shards-v1 4 1 107180000 0 171gb 85.5gb
{ "count": 53590000, "_shards": { "total": 4, "successful": 4, "failed": 0 } }
             
            
               
               
               
            
            
           
          
            
              
                anhlqn  
                (Anh)
               
              
                  
                    March 16, 2016,  5:36pm
                   
                   
              5 
               
             
            
              
How about
GET /customer-data-100m-v1/_stats
 
             
            
               
               
               
            
            
           
          
            
              
                hgfxng  
                (Herbert)
               
              
                  
                    March 16, 2016,  5:43pm
                   
                   
              6 
               
             
            
              I deleted a few docs to try something else, so here are the updated results, including the part of the output of /_stats
green open customer-data-100m-v1         2 1 104677426 12574 170.9gb    88gb 
 
{
   "count": 52338713,
   "_shards": {
      "total": 2,
      "successful": 2,
      "failed": 0
   }
}
 
The the output of /customer-data-100m-v1/_stats is at http://pastebin.com/raw/YpZn06jK 
             
            
               
               
               
            
            
           
          
            
              
                anhlqn  
                (Anh)
               
              
                  
                    March 16, 2016,  5:48pm
                   
                   
              7 
               
             
            
              _stats gave you the number you expected
"indices": {
      "customer-data-100m-v1": {
         "primaries": {
            "docs": {
               "count": 104677426,
               "deleted": 12574
            },
 
but it's strange that _count did not give you the same number. I checked many of my indices, and all gave the same number for both _count and _stats
             
            
               
               
               
            
            
           
          
            
              
                hgfxng  
                (Herbert)
               
              
                  
                    March 16, 2016,  5:51pm
                   
                   
              8 
               
             
            
              Yes, but what is count then?
{
   "count": 52338713,
 
             
            
               
               
               
            
            
           
          
            
              
                anhlqn  
                (Anh)
               
              
                  
                    March 16, 2016,  5:57pm
                   
                   
              9 
               
             
            
              You previous _count showed a total of two shards while the _stats showed 4 shards. Could you try
GET _cat/count/customer-data-100m-v1
 
             
            
               
               
               
            
            
           
          
            
              
                warkolm  
                (Mark Walkom)
               
              
                  
                    March 16, 2016,  8:17pm
                   
                   
              10 
               
             
            
              At an educated guess, _cat include replica documents while _stats doesn't.
             
            
               
               
               
            
            
           
          
            
              
                hgfxng  
                (Herbert)
               
              
                  
                    March 21, 2016,  4:03pm
                   
                   
              11 
               
             
            
              Hi,
here:
GET /_cat/count/customer-data-100m-v1
1458576068 17:01:08 52338713