Number of documents in Dev tools is different from number of document in Discover

I am using ELK 7.9.3 and I have an index which contain more then 2M documents, I used reindex API to have a sample using the following command:

POST /_reindex?wait_for_completion=false
{
  "max_docs": 1000, 
  "source": {
    "index": "firstIndex-2020.10.27-000313",
    "query": {
      "match_all": {}
    }
  },
  "dest": {
    "index": "develop1000"
  }
}

I am expected to have 1000 documents in the index named develop1000, that's the result of GET develop1000/_count but the problem is when I check in Discover, I only have 992 hits.

What could be the problem please? and how to investigate the issue? I have the same problem in another index!

Thanks folks.

Can you show the number of docs? :arrow_up:

Sure, below the total number of docs:

{
  "count" : 2652481,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  }
}

Hmmm... I right understand when you execute query:
GET develop1000/_count
result: 1000?

Perhaps some of the documents timestamps are not correct and inside / outside your time window in Discover OR somehow on the reindex the timestamp failed and those docs don't have a timestamp.

Also with 1000 documents just run the reindex in the foreground wait_for_completion=true and see if you get any errors

You can check what the min and max timestamp

GET develop1000/_search
{
  "size": 0,
  "aggs": {
    "min_date": {
      "min": {
        "field": "@timestamp",
        "format": "yyyy-MM-dd HH.mm.ss"
      }
    },
    "max_date": {
      "max": {
        "field": "@timestamp",
        "format": "yyyy-MM-dd HH.mm.ss"
      }
    }
  }
}

Yes exactly!

Here is the result of reindexing with wait_for_completion=true:

{
  "took" : 8916,
  "timed_out" : false,
  "total" : 1000,
  "updated" : 0,
  "created" : 1000,
  "deleted" : 0,
  "batches" : 1,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

seems that there is no error. Then I execute it the search query to know the max and min timestamp:

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1000,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "max_date" : {
      "value" : 1.603826335E12,
      "value_as_string" : "2020-10-27 19.18.55"
    },
    "min_date" : {
      "value" : 1.603826285E12,
      "value_as_string" : "2020-10-27 19.18.05"
    }
  }
}

What you said is interesting, how can I check if a given document have empty timestamp?

What is your index pattern time field?

@timestamp or the field value you show above. Discover is based on the time field defined in the index pattern

1 Like

For the time field it's during October 2020 for all events, in discover I set the time to look for 2 years ago to be sure to get all events. But as you suggested the problem is that the field @timestamp is missing for 8 documents and checked that using the following query, as you pointed out in your answer:

GET develop1000/_search
{
  "query": {
    "bool": {
      "must_not": {
        "exists": {
          "field": "@timestamp"
        }
      }
    }
  }
}

the returned result is exactly 8 documents which they don't have a @timestamp field:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
         // details for the 8 documents ...
      ]
}

Thank you @stephenb ! :grinning:

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.