Different number of hits for the same time range

As I am using the same time range for the same query, it is returning different numbers of hits on re-query.

{
  "size": "0",
  "query": {
    "bool": {
      "must": [
        {
          "range": {
            "timestamp1": {
              "gte": 1597247223502,
              "lte": 1597333623502
            }
          }
        },
        {
          "match_phrase": {
            "tags.keyword": {
              "query": "lilly"
            }
          }
        },
        {
          "match_phrase": {
            "tags.keyword": {
              "query": "syslog"
            }
          }
        }
      ]
    }
  }
}

Return 1

{
  "took": 196,
  "timed_out": false,
  "_shards": {
    "total": 880,
    "successful": 880,
    "skipped": 804,
    "failed": 0
  },
  "hits": {
    "total": 5415455,
    "max_score": 0,
    "hits": []
  }
}

Return 2 a few seconds later.

{
  "took": 161,
  "timed_out": false,
  "_shards": {
    "total": 880,
    "successful": 880,
    "skipped": 804,
    "failed": 0
  },
  "hits": {
    "total": 5416033,
    "max_score": 0,
    "hits": []
  }
}

Is this just a big data thing?

Thanks
Norm

Hey,

I assume you are sure, that no new data gets indexed, so let's try to figure out what happens here. If you have replicas for your shards, it is likely that with each query, you are hitting different shards across your cluster. In order to prevent this, you can try to set a search preference - this will ensure that you keep hitting the same shards when using a custom value.

In that case, does your result count stay the same or does it still differ?

Also, what Elasticsearch version are you using?

--Alex

Thanks for the reply Alex. Your first assumption probably correct. This is my busiest index and there looks like more records are being indexed. However, let me try to understand this. When the index pattern was created for this index, @timestamp was used as the index time stamp, timestamp1 is part of the logs that records the actual log event. If I use @timestamp in the query, I get the same hits.total every time, for timestamp1 the total changes. Is this a function of designating @timestamp, the indexing time, as the index time stamp? Hope that makes sense. Too many times and stamps.

Norm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.