0 hits, 14693688 total on index with 700 documents

Hi, I'm having the following problem on an Elasticsearch 5.4.3 cluster. When making a query where some shards don't return any hits they seem to return an incorrect total. This happens only on some nodes and restarting them fixed it temporarily.

/_search?q=status:any

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 32,
    "successful": 32,
    "failed": 0
  },
  "hits": {
    "total": 14693688,
    "max_score": 0.0,
    "hits": [
  
    ]
  }
}

The same query might return total: 0, if it doesn't go to any of the problematic nodes. I've tried hitting a certain shard with preference=_shards:n|_local and the problem seems to be at the node level, it also happens in every index in the cluster.

The Count API with similar queries returns correct totals.

Any help is appreciated!

What's the output of `_cat/shards?v? look like?

Looks normal

21 r STARTED 24 110.6kb
21 r STARTED 24 110.6kb
21 p STARTED 24   110kb
12 p STARTED 32 100.8kb
12 r STARTED 32 100.8kb
12 r STARTED 32 100.8kb

All shards are STARTED.

Do you have an index with 21 shards?

32 actually (probably a bad idea in this case). Do you think the number of shards is related to the issue?

If you have 32 shards for 700 docs it's never a good idea.

I agree.

Anyhow, I don't think the problem is related to the number of shards. To test it out I created a new index with 5 shards and 1 replica in the same cluster, indexed 2 documents and sent a similar query:

curl -X POST /test-totals/_search -d '{"query": {"term": {"status": "created"}}}'
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1836712,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "test-totals",
        "_type": "test",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "status": "created"
        }
      }
    ]
  }
}

Thing is, the number of docs in a shard influence the results of relevance in a search.
I would 100% recommend you try again with a single shard.

But something looks weird here. How can you get that number of hits with only 2 documents indexed?

Could you share the full script to reproduce that? I feel like you are doing something wrong.

Hi, I'm afraid you will not be able to reproduce it easily, I have several clusters running elastic 5.4.3 and I only see this issue in this one. As I said in my original post, I believe it is an issue at the node level as it happens on shards that are in a couple of nodes and it is fixed temporarily when restarting those nodes. It then starts happening on other nodes.

What I did was:

curl -X PUT http://localhost:9200/test-totals -d '{"number_of_shards": 5}'
curl -X PUT http://localhost:9200/test-totals/test/1 -d '{"status": "created"}'
curl -X PUT http://localhost:9200/test-totals/test/2 -d '{"status": "other"}'
curl -X POST http://localhost:9200/test-totals/_search -d '{"query": {"term": {"status": "created"}}}'

As Mark asked, can you run:

GET _cat/shards?v

And share all the output?

Sure: shards.txt.

It's a bit of a mess but I have an identical cluster with the same data and indices that doesn't have this behavior.

I did not see any test-totals index in the shards output. Is that by any chance an alias? If so, what is the definition of that alias?

Didn't realize I had deleted the index after the test, here is the output of every step:

curl -X PUT  http://localhost:9200/test-totals
# {"acknowledged":true,"shards_acknowledged":true}

curl -X PUT  http://localhost:9200/test-totals/test/1 -d '{"status": "created"}'
# {"_index":"test-totals","_type":"test","_id":"1","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"created":true}

curl -X PUT  http://localhost:9200/test-totals/test/2 -d '{"status": "other"}'
# {"_index":"test-totals","_type":"test","_id":"2","_version":1,"result":"created","_shards":{"total":2,"successful":2,"failed":0},"created":true}

curl -X POST http://localhost:9200/test-totals/_search -d '{"query": {"term": {"status": "created"}}}'
# {"took":2,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1836712,"max_score":0.2876821,"hits":[{"_index":"test-totals","_type":"test","_id":"1","_score":0.2876821,"_source":{"status": "created"}}]}}

curl -X GET http://localhost:9200/_cat/shards/test-totals
# test-totals 1 r STARTED 0  130b 10.64.93.143 i-0a24bef506fa1166b
# test-totals 1 p STARTED 0  130b 10.64.95.71  i-02a3e2df248b271d5
# test-totals 2 p STARTED 1 3.2kb 10.64.92.161 i-09cab816e090dfea9
# test-totals 2 r STARTED 1 3.2kb 10.64.94.22  i-0344657cb1aecd18b
# test-totals 4 p STARTED 0  130b 10.64.92.230 i-0c9b38f6ebf2a6658
# test-totals 4 r STARTED 0  130b 10.64.93.252 i-0ac7a100527b0fde2
# test-totals 3 p STARTED 1 3.2kb 10.64.92.212 i-05122a8f025abd4e1
# test-totals 3 r STARTED 1 3.2kb 10.64.95.83  i-0fab00d6d8bee2e58
# test-totals 0 p STARTED 0  130b 10.64.94.131 i-004eab7c862e21279
# test-totals 0 r STARTED 0  130b 10.64.92.77  i-0687677d1d8b2eb11

Any chance you could share the data dir of your cluster and send it by email?

I'm super curious about that behaviour.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.