Deleted documents are present in targeted search but not in Kibana

Hello,

To follow stats about some id, I continuously "update" ES documents based on an _id (about 40 millions updates per day). Update means I execute a upsert script that add a value to a counter and update a "last touched" date field. When a document was not touched for one month, I delete the _id. Some _ids may run for many month and are never deleted, other can run for few minutes.

To suppress the too old _id I do a delete_by_query which work as expected from the human interface (Kibana) point of view since I have :

  • A sliding window of one month of time, if I use "last touched" as the timestamp, with no data showing in the histogram before "now-1 month".
  • A count of 1,2 billion _id

However, connected to ElasticSearch with curl, I have the following metrics:

  • _cat/indices: 40 2 2539482432 169919997 4.3tb 1.4tb
  • _count: 2539496376

As you can see, the document count is 2,5 billions and only 0.17 billions are marked as deleted (i.e. updated since last merge, if I understand correctly). If I do a search with a curl for _id with a "last touched" older than one month, I get documents back! But Kibana doesn't show them!

I would like to know:

  • What is going on?
  • How to prevent this index to grow out of control without taking it down?

I'm using Elasticsearch 6.0.0.

Thanks.

SOLVED | The automatic deletion (_delete_by_query) was not able to run for days because of version conflict. Added conflicts=proceed.

Hey Didier

Please share what kind of search requests you are running. What is the mapping and a document that should have been removed.

Well, first I made an mistake I don't understand. Double checking the Kibana result, I can actually see the 2,5 billions events! Could you please confirm that my command is doing what it should?

curl -H "Content-Type: application/json" -s -XPOST "http://myelasticsearch/myindex/_delete_by_query" -d '{ "query": { "range" : { "lastinfodate" : { "lte": "now-1M" } } } }'

with lastinfodate being a confirmed "date" field by Kibana.

What is the output of:

GET myindex/_search
{ 
  "size": 0,
  "query": { "range" : { "lastinfodate" : { "lte": "now-1M" } } } 
}
{
  "took" : 1713,
  "timed_out" : false,
  "_shards" : {
    "total" : 40,
    "successful" : 40,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1273589539,
    "max_score" : 0.0,
    "hits" : [ ]
  }
}

Here is the actual return of the delete_by_query I just did after the previous search.

{
  "took" : 412014,
  "timed_out" : false,
  "total" : 1273643754,
  "deleted" : 924999,
  "batches" : 925,
  "version_conflicts" : 1,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [
    {
      "index" : "myindex",
      "type" : "session",
      "id" : "85132cf4b368bee2d4f9f5c138ba96a0",
      "cause" : {
        "type" : "version_conflict_engine_exception",
        "reason" : "[session][85132cf4b368bee2d4f9f5c138ba96a0]: version conflict, current version [3] is different than the one provided [1]",
        "index_uuid" : "L7mduIf5QnKBOIWEkCUUvA",
        "shard" : "2",
        "index" : "myindex"
      },
      "status" : 409
    }
  ]
}

I see a version conflict. I'll retry by adding conflicts=proceed to the url.

I don't know what is a conflict. Does it mean that the document was updated when it was about to be deleted?

Most likely yes.

I wonder if you should not change your design though and move documents to a monthly index anytime a document is updated.
Which should be easy with aliases BTW.

Then drop the index of month-2 entirely which will be much much more efficient than running delete by query.

On a mobile now so can't really write more but if needed I can explain the concept if you did not fully get it.

Well, we need to preserve uniqueness and nothing prevent an id to be there for many month, so we actually need to track those billions of id individually from the time they appear and only drop them when they have more than one month of inactivity. Suppressing every day the old id during the low peak hour will be painless. However the big catch-up delete_by_query right now will take a while.

Anyway, thanks for your assistance. Since my first message may imply a weird behaviour of ES with delete_by_query while it's in fact a problem of my diagnosis, you're welcome to remove the entire topic if you wish. I added a SOLVED section to the topic first message.

Regards.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.