Documents _version changes

d.silwon · October 26, 2021, 11:01am

Dears,

Does the system problems like missing space on disks or exceeded heap space or network problems can have influence on change documents _version? What is your experience with that?

Why I'm asking about it? Because in our test environment I noticed that one my index included documents with version higher than 1. To this index users have read only access and only filebeat have readwrite grant.

I try fing out what happened or what is wrong. One thing which I found out in Elasticsearch log is such strange warning:
Data too large, data for [<transport_request>] would be [2041274936/1.9gb], which is larger than the limit of [2040109465/1.8gb], real usage: [2041243712/1.9gb], new bytes reserved: [31224/30.4kb], usages [request=0/0b, fielddata=36923/36kb, in_flight_requests=31224/30.4kb, accounting=64180852/61.2mb]]; ], markAsStale [true]]

Best Regards,
Dan

warkolm · October 26, 2021, 10:09pm

Document version changes only happen if there's an update or delete, running out of heap won't change that.

Is it causing problems?

d.silwon · October 27, 2021, 5:00am

Hi @warkolm ,

Users have readonly priviliges to problematic index. The filebeat can write to this index only and I've to find out what happend. I try to explain what cause document _version update in my case. How to analyze and monitor such cases?

Best Regards,
Dan

stephenb · October 27, 2021, 5:58am

Are you setting the document ID _id or are you letting Elasticsearch generate it for you.

d.silwon · October 27, 2021, 6:07am

@stephenb

In my case the document ID _id is maintained by filebeat:

cat filebeat.yml
filebeat.config.inputs:
  path: ${path.config}/conf.d/*.yml
  reload.enabled: true

output.logstash:
  hosts: ["elastic01:5044", "elastic02:5044", "elastic03:5044"]

  ssl.certificate_authorities: ["/etc/filebeat/certs/ca.crt"]
  ssl.certificate: "/etc/filebeat/certs/machine.crt"
  ssl.key: "/etc/filebeat/certs/machine.key"

processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~
  - add_docker_metadata: ~
  - add_kubernetes_metadata: ~
  - add_locale:
      format: abbreviation
  - add_id: ~

logging.level: debug
logging.to_files: true
logging.files:
  path: /var/log/filebeat
  name: filebeat
  keepfiles: 7
  permissions: 0644

filebeat version
filebeat version 7.7.0 (amd64), libbeat 7.7.0 [5e69e25b920e3d93bec76a09a31da3ab35a55607 built 2020-05-12 00:53:16 +0000 UTC]

Regards,
Dan

d.silwon · November 2, 2021, 6:02am

Any idea?

stephenb · November 2, 2021, 3:19pm

I do not, can you show a sample document?

it Appears that Something IS updating the documents

Are the _version numbers that are greater that 1 always the same number or different numbers?

Are you using scripts on insert? Here is a little article with some insight if you are

d.silwon · November 3, 2021, 6:23am

@stephenb

Here you are the sample doc

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "appm-2021.11.02",
        "_type" : "_doc",
        "_id" : "gP_I4XwB2j3mkX2mqYWL",
        "_version" : 2,
        "_score" : 1.0,
        "_source" : {
          "day" : "02",
          "logtime" : "2021-11-02 18:52:57.537",
          "year" : "2021",
          "agent" : {
            "id" : "556a375b-0ecf-4c10-962b-58f3b5aa8bee",
            "ephemeral_id" : "392c04a8-7243-4b1f-a571-154321130e62",
            "version" : "7.10.0",
            "hostname" : "appsrv1.corp",
            "type" : "filebeat",
            "name" : "appsrv1.corp"
          },
          "tags" : [
            "app",
            "beats_input_codec_plain_applied"
          ],
          "app_msg" : "Skipping checkpoint (no pages were modified) [checkpointBeforeLockTime=4ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=4ms, checkpointLockHoldTime=5ms, reason='timeout']",
          "logLevel" : "INFO",
          "event" : {
            "timezone" : "CET"
          },
          "@timestamp" : "2021-11-02T17:52:57.537Z",
          "month" : "11",
          "message" : "2021.11.02 18:52:57.537 [db-checkpoint-thread-#59] INFO  o.a.i.i.p.c.p.GridCacheDatabaseSharedManager - Skipping checkpoint (no pages were modified) [checkpointBeforeLockTime=4ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=4ms, checkpointLockHoldTime=5ms, reason='timeout']",
          "@version" : "1",
          "time" : "18:52:57.537",
          "input" : {
            "type" : "log"
          },
          "log" : {
            "offset" : 500197,
            "file" : {
              "path" : "/apps/m/log/application.log"
            }
          },
          "ecs" : {
            "version" : "1.6.0"
          },
          "application" : "db-checkpoint-thread-#59",
          "host" : {
            "id" : "73c99579400044adb0a3fe01931c97b0",
            "os" : {
              "kernel" : "3.10.0-1160.6.1.el7.x86_64",
              "platform" : "rhel",
              "family" : "redhat",
              "version" : "7.9 (Maipo)",
              "codename" : "Maipo",
              "name" : "Red Hat Enterprise Linux Server"
            },
            "architecture" : "x86_64",
            "containerized" : false,
            "ip" : [
              "192.168.1.1"
            ],
            "mac" : [
              "00:10:36:93:b1:ef"
            ],
            "hostname" : "appsrv1.corp",
            "name" : "appsrv1.corp"
          },
          "class" : "o.a.i.i.p.c.p.GridCacheDatabaseSharedManager"
        }
      }
    ]
  }
}

and original message from log file:

2021.11.02 18:52:57.537 [db-checkpoint-thread-#59] INFO  o.a.i.i.p.c.p.GridCacheDatabaseSharedManager - Skipping checkpoint (no pages were modified) [checkpointBeforeLockTime=4ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=4ms, checkpointLockHoldTime=5ms, reason='timeout']

Some times the document _version is higher than 2, sometimes this is 3 or 4.

As you can see from the sample the document was create by filebeat. Filebeat sends documents to logstash and logstash to Elasticsearch. In Logstash configuration there is such roles for tag "app":

  if "app" in [tags] {
    grok {
      match => ["message", "%{YEAR:year}.%{MONTHNUM:month}.%{MONTHDAY:day} %{TIME:time} \[%{DATA:application}] %{LOGLEVEL:logLevel} %{SPACE}*%{JAVACLASS:class} %{NOTSPACE} %{GREEDYDATA:app_msg}"]
    }
  }

and for hostname:

  else if "appsrv1" in [host][hostname] and "apps" in [log][file][path] {
    mutate {
      replace => { "[@metadata][index_prefix]" => "appm-%{+YYYY.MM.dd}" }
    }
  }

and output from logstash:

output {
  elasticsearch {
    hosts => ["https://${HOSTNAME}:9200"]
    cacert => '/etc/logstash/certs/ca.crt'
    user => 'logstash_internal'
    password => '${ES_PWD}'
    ilm_enabled => false
    document_id => "%{[@metadata][_id]}"
    index => "%{[@metadata][index_prefix]}"
  }
}

Best Regards,
Dan

warkolm · November 3, 2021, 6:48am

That would be why, looks like Filebeat is picking up a few duplicate events.

d.silwon · November 3, 2021, 7:18am

@warkolm

I understand it differently. The _id document is retrieve from the document metadata.
This message appeared once in the log file:

2021.11.02 18:52:57.537 [db-checkpoint-thread-#59] INFO  o.a.i.i.p.c.p.GridCacheDatabaseSharedManager - Skipping checkpoint (no pages were modified) [checkpointBeforeLockTime=4ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=4ms, checkpointLockHoldTime=5ms, reason='timeout']

Why you claim that events are duplicated?

Christian_Dahlqvist · November 3, 2021, 8:08am

As you initially mentioned the cluster having problems and throwing errors it is possible that Logstash have sent a batch and not received any response and therefore have retried. This can cause the same batch to be sent to Elasticsearch more than once which in your case, as you specify document id, might lead to an insert followed by an update.

d.silwon · November 3, 2021, 9:07am

@Christian_Dahlqvist
Thanks for your reply. All problems were solved few days ago but still appears some documents with higher version than 1. I'm still searching the cause of this.

Christian_Dahlqvist · November 3, 2021, 9:13am

Both Beats and Logstash offer at-least-once delivery guarantee, so this can happen as soon as there is any hiccup (transient or more persistent) between either Filebeat and Logstash (data may get resent to Logstash) or Logstash and Elasticsearch (data may get resent to Elasticsearch).

d.silwon · November 3, 2021, 9:45am

OK, I understand. Maybe it's some clue. Is there any way to monitor such cases in Elasticsearch?

system · December 1, 2021, 9:46am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Doubt on using versioning? Elasticsearch	2	289	July 6, 2017
Logstash/Elasticsearch process overwriting doc version each time? Logstash	2	1043	January 14, 2017
Document version number not advancing Elasticsearch	9	2602	February 15, 2018
Reading and writing the same document too fast --> data loss Elasticsearch	5	582	July 6, 2017
Updating documents with Elasticsearch output Logstash	6	7907	August 7, 2017

Documents _version changes

Related topics