Documents _version changes

Dears,

Does the system problems like missing space on disks or exceeded heap space or network problems can have influence on change documents _version? What is your experience with that?

Why I'm asking about it? Because in our test environment I noticed that one my index included documents with version higher than 1. To this index users have read only access and only filebeat have readwrite grant.

I try fing out what happened or what is wrong. One thing which I found out in Elasticsearch log is such strange warning:
Data too large, data for [<transport_request>] would be [2041274936/1.9gb], which is larger than the limit of [2040109465/1.8gb], real usage: [2041243712/1.9gb], new bytes reserved: [31224/30.4kb], usages [request=0/0b, fielddata=36923/36kb, in_flight_requests=31224/30.4kb, accounting=64180852/61.2mb]]; ], markAsStale [true]]

Best Regards,
Dan

Document version changes only happen if there's an update or delete, running out of heap won't change that.

Is it causing problems?

Hi @warkolm ,

Users have readonly priviliges to problematic index. The filebeat can write to this index only and I've to find out what happend. I try to explain what cause document _version update in my case. How to analyze and monitor such cases?

Best Regards,
Dan

Are you setting the document ID _id or are you letting Elasticsearch generate it for you.

@stephenb

In my case the document ID _id is maintained by filebeat:

cat filebeat.yml
filebeat.config.inputs:
  path: ${path.config}/conf.d/*.yml
  reload.enabled: true

output.logstash:
  hosts: ["elastic01:5044", "elastic02:5044", "elastic03:5044"]

  ssl.certificate_authorities: ["/etc/filebeat/certs/ca.crt"]
  ssl.certificate: "/etc/filebeat/certs/machine.crt"
  ssl.key: "/etc/filebeat/certs/machine.key"

processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~
  - add_docker_metadata: ~
  - add_kubernetes_metadata: ~
  - add_locale:
      format: abbreviation
  - add_id: ~

logging.level: debug
logging.to_files: true
logging.files:
  path: /var/log/filebeat
  name: filebeat
  keepfiles: 7
  permissions: 0644
filebeat version
filebeat version 7.7.0 (amd64), libbeat 7.7.0 [5e69e25b920e3d93bec76a09a31da3ab35a55607 built 2020-05-12 00:53:16 +0000 UTC]

Regards,
Dan

Any idea?

I do not, can you show a sample document?

it Appears that Something IS updating the documents

Are the _version numbers that are greater that 1 always the same number or different numbers?

Are you using scripts on insert? Here is a little article with some insight if you are

@stephenb

Here you are the sample doc

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "appm-2021.11.02",
        "_type" : "_doc",
        "_id" : "gP_I4XwB2j3mkX2mqYWL",
        "_version" : 2,
        "_score" : 1.0,
        "_source" : {
          "day" : "02",
          "logtime" : "2021-11-02 18:52:57.537",
          "year" : "2021",
          "agent" : {
            "id" : "556a375b-0ecf-4c10-962b-58f3b5aa8bee",
            "ephemeral_id" : "392c04a8-7243-4b1f-a571-154321130e62",
            "version" : "7.10.0",
            "hostname" : "appsrv1.corp",
            "type" : "filebeat",
            "name" : "appsrv1.corp"
          },
          "tags" : [
            "app",
            "beats_input_codec_plain_applied"
          ],
          "app_msg" : "Skipping checkpoint (no pages were modified) [checkpointBeforeLockTime=4ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=4ms, checkpointLockHoldTime=5ms, reason='timeout']",
          "logLevel" : "INFO",
          "event" : {
            "timezone" : "CET"
          },
          "@timestamp" : "2021-11-02T17:52:57.537Z",
          "month" : "11",
          "message" : "2021.11.02 18:52:57.537 [db-checkpoint-thread-#59] INFO  o.a.i.i.p.c.p.GridCacheDatabaseSharedManager - Skipping checkpoint (no pages were modified) [checkpointBeforeLockTime=4ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=4ms, checkpointLockHoldTime=5ms, reason='timeout']",
          "@version" : "1",
          "time" : "18:52:57.537",
          "input" : {
            "type" : "log"
          },
          "log" : {
            "offset" : 500197,
            "file" : {
              "path" : "/apps/m/log/application.log"
            }
          },
          "ecs" : {
            "version" : "1.6.0"
          },
          "application" : "db-checkpoint-thread-#59",
          "host" : {
            "id" : "73c99579400044adb0a3fe01931c97b0",
            "os" : {
              "kernel" : "3.10.0-1160.6.1.el7.x86_64",
              "platform" : "rhel",
              "family" : "redhat",
              "version" : "7.9 (Maipo)",
              "codename" : "Maipo",
              "name" : "Red Hat Enterprise Linux Server"
            },
            "architecture" : "x86_64",
            "containerized" : false,
            "ip" : [
              "192.168.1.1"
            ],
            "mac" : [
              "00:10:36:93:b1:ef"
            ],
            "hostname" : "appsrv1.corp",
            "name" : "appsrv1.corp"
          },
          "class" : "o.a.i.i.p.c.p.GridCacheDatabaseSharedManager"
        }
      }
    ]
  }
}

and original message from log file:

2021.11.02 18:52:57.537 [db-checkpoint-thread-#59] INFO  o.a.i.i.p.c.p.GridCacheDatabaseSharedManager - Skipping checkpoint (no pages were modified) [checkpointBeforeLockTime=4ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=4ms, checkpointLockHoldTime=5ms, reason='timeout']

Some times the document _version is higher than 2, sometimes this is 3 or 4.

As you can see from the sample the document was create by filebeat. Filebeat sends documents to logstash and logstash to Elasticsearch. In Logstash configuration there is such roles for tag "app":

  if "app" in [tags] {
    grok {
      match => ["message", "%{YEAR:year}.%{MONTHNUM:month}.%{MONTHDAY:day} %{TIME:time} \[%{DATA:application}] %{LOGLEVEL:logLevel} %{SPACE}*%{JAVACLASS:class} %{NOTSPACE} %{GREEDYDATA:app_msg}"]
    }
  }

and for hostname:

  else if "appsrv1" in [host][hostname] and "apps" in [log][file][path] {
    mutate {
      replace => { "[@metadata][index_prefix]" => "appm-%{+YYYY.MM.dd}" }
    }
  }

and output from logstash:

output {
  elasticsearch {
    hosts => ["https://${HOSTNAME}:9200"]
    cacert => '/etc/logstash/certs/ca.crt'
    user => 'logstash_internal'
    password => '${ES_PWD}'
    ilm_enabled => false
    document_id => "%{[@metadata][_id]}"
    index => "%{[@metadata][index_prefix]}"
  }
}

Best Regards,
Dan

That would be why, looks like Filebeat is picking up a few duplicate events.

@warkolm

I understand it differently. The _id document is retrieve from the document metadata.
This message appeared once in the log file:

2021.11.02 18:52:57.537 [db-checkpoint-thread-#59] INFO  o.a.i.i.p.c.p.GridCacheDatabaseSharedManager - Skipping checkpoint (no pages were modified) [checkpointBeforeLockTime=4ms, checkpointLockWait=0ms, checkpointListenersExecuteTime=4ms, checkpointLockHoldTime=5ms, reason='timeout']

Why you claim that events are duplicated?

As you initially mentioned the cluster having problems and throwing errors it is possible that Logstash have sent a batch and not received any response and therefore have retried. This can cause the same batch to be sent to Elasticsearch more than once which in your case, as you specify document id, might lead to an insert followed by an update.

@Christian_Dahlqvist
Thanks for your reply. All problems were solved few days ago but still appears some documents with higher version than 1. I'm still searching the cause of this.

Both Beats and Logstash offer at-least-once delivery guarantee, so this can happen as soon as there is any hiccup (transient or more persistent) between either Filebeat and Logstash (data may get resent to Logstash) or Logstash and Elasticsearch (data may get resent to Elasticsearch).

OK, I understand. Maybe it's some clue. Is there any way to monitor such cases in Elasticsearch?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.