hgfxng
(Herbert)
March 16, 2016, 2:13pm
1
Hi,
I'm doing some load testing and I found something strange on ElasticSearch (2.2.0).
I feed a single index with 104.690.000 documents. This index has only 2 shards.
When I check the output of /_cat/indices
, I get this:
green open customer-data-100m-v1 2 1 104690000 0 171.2gb 88.3gb
But when I do a GET /customer-data-100m-v1/customer/_count
, I get half number of documents:
{ "count": 52345000, "_shards": { "total": 2, "successful": 2, "failed": 0 } }
Is this somehow related to the number of shards? Shouldn't I get the same count?
evolgas
(Evan Volgas)
March 16, 2016, 3:10pm
2
What happens if you GET /customer-data-100m-v1/_count
instead of GET /customer-data-100m-v1/customer/_count
hgfxng
(Herbert)
March 16, 2016, 4:33pm
4
So, it looks like it's not related to the number of shards, because I loaded another index with +100mi docs, but now with 4 shards, and I also get half count.
green open customer-data-100m-4shards-v1 4 1 107180000 0 171gb 85.5gb
{ "count": 53590000, "_shards": { "total": 4, "successful": 4, "failed": 0 } }
anhlqn
(Anh)
March 16, 2016, 5:36pm
5
How about
GET /customer-data-100m-v1/_stats
hgfxng
(Herbert)
March 16, 2016, 5:43pm
6
I deleted a few docs to try something else, so here are the updated results, including the part of the output of /_stats
green open customer-data-100m-v1 2 1 104677426 12574 170.9gb 88gb
{
"count": 52338713,
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
}
}
The the output of /customer-data-100m-v1/_stats
is at http://pastebin.com/raw/YpZn06jK
anhlqn
(Anh)
March 16, 2016, 5:48pm
7
_stats gave you the number you expected
"indices": {
"customer-data-100m-v1": {
"primaries": {
"docs": {
"count": 104677426,
"deleted": 12574
},
but it's strange that _count did not give you the same number. I checked many of my indices, and all gave the same number for both _count and _stats
hgfxng
(Herbert)
March 16, 2016, 5:51pm
8
Yes, but what is count
then?
{
"count": 52338713,
anhlqn
(Anh)
March 16, 2016, 5:57pm
9
You previous _count showed a total of two shards while the _stats showed 4 shards. Could you try
GET _cat/count/customer-data-100m-v1
warkolm
(Mark Walkom)
March 16, 2016, 8:17pm
10
At an educated guess, _cat
include replica documents while _stats
doesn't.
hgfxng
(Herbert)
March 21, 2016, 4:03pm
11
Hi,
here:
GET /_cat/count/customer-data-100m-v1
1458576068 17:01:08 52338713