Filebeat 6.4.2 and 6.5.1: Read line error: "parsing CRI timestamp" and "invalid CRI log format"


(Raman Gupta) #1

I am getting various CRI parsing errors, on both Filebeat 6.4.2 and 6.5.1. I have cleared all filebeat state and restarted Filebeat, but these errors always occur. Here are a couple:

2018-12-04T02:32:18.636Z	ERROR	log/harvester.go:281	Read line error: parsing CRI timestamp: parsing time "3.chunk.js" as "2006-01-02T15:04:05Z07:00": cannot parse "unk.js" as "2006"; File: /var/lib/docker/containers/2a7d435274655d581cbdba63c8477042a30583c7cb84c9b2ef5a4c5af9bd2da7/2a7d435274655d581cbdba63c8477042a30583c7cb84c9b2ef5a4c5af9bd2da7-json.log

and

2018-12-04T14:26:29.853Z	ERROR	log/harvester.go:281	Read line error: invalid CRI log format; File: /var/lib/docker/containers/4da60db6721a545e448eb42732a57372d239b3882397a472e394504f9a448537/4da60db6721a545e448eb42732a57372d239b3882397a472e394504f9a448537-json.log

All the log lines from the input files referenced appear to be valid JSON, one JSON object per line.

It looks like these are not new -- looking through my logs, it appears these have been happening for some time, and on the previous 6.4.2 version of Filebeat as well, at a rate of around 6 / minute.

I use Filebeat in Kubernetes, with autodiscover enabled. Here is my config:

apiVersion: v1
kind: ConfigMap
metadata:
  name: filebeat-config
  namespace: kube-system
  labels:
    k8s-app: filebeat
data:
  filebeat.yml: |-
    filebeat.autodiscover:
      providers:
        - type: kubernetes
          hints.enabled: true

    processors:
      - add_cloud_metadata: ~
      # This convoluted rename/rename/drop is necessary due to
      # https://discuss.elastic.co/t/filebeats-auto-mapping-of-kubernetes-labels-causing-big-issues/154718/9
      - rename:
          when:
            has_fields: ['kubernetes.labels.app.kubernetes.io/name']
          fields:
            - from: 'kubernetes.labels.app'
              to: 'kubernetes.labels.appobject'
          ignore_missing: true
          fail_on_error: false
      - rename:
          when:
            has_fields: ['kubernetes.labels.appobject']
          fields:
            - from: 'kubernetes.labels.appobject.kubernetes.io/name'
              to: 'kubernetes.labels.app'
            - from: 'kubernetes.labels.appobject.kubernetes.io/part-of'
              to: 'kubernetes.labels.part-of'
          ignore_missing: true
          fail_on_error: false
      - drop_fields:
          when:
            has_fields: ['kubernetes.labels.appobject']
          fields:
            - 'kubernetes.labels.appobject'

    output.elasticsearch:
      hosts: ['${ELASTICSEARCH_HOST:eslogging.default}:${ELASTICSEARCH_PORT:9200}']

(Raman Gupta) #2

Bump...


(Shaunak Kashyap) #3

You might be running into this bug: https://github.com/elastic/beats/issues/8203. Can you try the steps outlined in https://github.com/elastic/beats/issues/8203#issuecomment-422788000 and see if you still experience this issue?


(Raman Gupta) #4

Thanks for your reply. As per my OP, I noted : "I have cleared all filebeat state and restarted Filebeat". I still experience the issue.


(Raman Gupta) #5

Bump


(Michael Eves) #6

Hi,

We've also been experiencing a similar error on 6.5.3. We had previously not encountered the issue on our previous version (6.4.1), however we have made configuration changes since. The errors we see are below:

filebeat-prod-66kr2 filebeat 2018-12-19T16:01:12.541135407Z 2018-12-19T16:01:12.540Z	ERROR	log/harvester.go:281	Read line error: parsing CRI timestamp: parsing time "g":"+" as "2006-01-02T15:04:05Z07:00": cannot parse "g":"+" as "2006"; File: /var/lib/docker/containers/361ee33ba6132f114221add47e1f2e47f32115d49384646d06d50f69bc55f34f/361ee33ba6132f114221add47e1f2e47f32115d49384646d06d50f69bc55f34f-json.log

From the logs I've observed it always seems to be the same couple of logs that it complains about. A sample of the two culprits is below:

{"log":"[2018-12-19 16:14:32.031][25][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:512] warming cluster outbound|80||app.e2e-420-project-logs-previous.svc.cluster.local complete\n","stream":"stderr","time":"2018-12-19T16:14:32.031734973Z"}
{"log":"[2018-12-19 16:14:37.298][25][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:553] removing cluster outbound|80||app.e2e-420-deploy-purge-and-recreate.svc.cluster.local\n","stream":"stderr","time":"2018-12-19T16:14:37.299056576Z"}
{"log":"[2018-12-19 16:16:54.867][25][info][upstream] external/envoy/source/common/upstream/cluster_manager_impl.cc:553] removing cluster outbound|80||app.e2e-420-project-logs-previous.svc.cluster.local\n","stream":"stderr","time":"2018-12-19T16:16:54.867474246Z"}
{"log":"-A ISTIO_OUTPUT -d 10.192.0.1/32 -j RETURN\n","stream":"stdout","time":"2018-12-19T16:07:58.80186692Z"}
{"log":"-A ISTIO_OUTPUT -d 10.206.0.0/19 -j ISTIO_REDIRECT\n","stream":"stdout","time":"2018-12-19T16:07:58.801869575Z"}
{"log":"-A ISTIO_OUTPUT -d 10.192.96.0/22 -j ISTIO_REDIRECT\n","stream":"stdout","time":"2018-12-19T16:07:58.801872266Z"}
{"log":"-A ISTIO_OUTPUT -d 10.202.0.0/16 -j ISTIO_REDIRECT\n","stream":"stdout","time":"2018-12-19T16:07:58.80187497Z"}
{"log":"-A ISTIO_OUTPUT -d 10.192.64.0/20 -j ISTIO_REDIRECT\n","stream":"stdout","time":"2018-12-19T16:07:58.801877628Z"}
{"log":"-A ISTIO_OUTPUT -d 10.198.0.0/16 -j ISTIO_REDIRECT\n","stream":"stdout","time":"2018-12-19T16:07:58.801880315Z"}
{"log":"-A ISTIO_OUTPUT -d 10.192.32.0/20 -j ISTIO_REDIRECT\n","stream":"stdout","time":"2018-12-19T16:07:58.801899722Z"}
{"log":"-A ISTIO_OUTPUT -d 10.194.0.0/16 -j ISTIO_REDIRECT\n","stream":"stdout","time":"2018-12-19T16:07:58.8019026Z"}
{"log":"-A ISTIO_OUTPUT -d 10.192.0.0/20 -j ISTIO_REDIRECT\n","stream":"stdout","time":"2018-12-19T16:07:58.80190537Z"}
{"log":"-A ISTIO_OUTPUT -j RETURN\n","stream":"stdout","time":"2018-12-19T16:07:58.801908081Z"}
{"log":"-A ISTIO_REDIRECT -p tcp -j REDIRECT --to-ports 15001\n","stream":"stdout","time":"2018-12-19T16:07:58.801910678Z"}
{"log":"COMMIT\n","stream":"stdout","time":"2018-12-19T16:07:58.801913412Z"}
{"log":"# Completed on Wed Dec 19 16:07:58 2018\n","stream":"stdout","time":"2018-12-19T16:07:58.801916208Z"}
  • Despite the errors, other logs appear to be processed OK for the lifetime of that pod.
  • When the pod was restarted however, I observed the replacement pod seemed to not process any logs at all.
  • Deleting the pod & removing the registry file meant the subsequent pod came up and started processing logs, but again the issue can/does pop up shortly after.
  • I think on occasions where the log file has been deleted that Filebeat is stuck processing, when the pod is restarted since it loses the file handle it moves on, and the errors go away.

Our filebeat yaml is as followed:

logging.level: error

filebeat.registry_flush: 2s

filebeat.config:
  prospectors:
    # Mounted `filebeat-prospectors` configmap:
    path: ${path.config}/prospectors.d/*.yml
    # Reload prospectors configs as they change:
    reload.enabled: false
  modules:
    path: ${path.config}/modules.d/*.yml
    # Reload module configs as they change:
    reload.enabled: false

filebeat.autodiscover:
  providers:
    - type: kubernetes
      hints.enabled: true
      include_annotations: ['autotrader/logging']

processors:
- add_cloud_metadata: ~
- drop_event:
    when:
      not:
        equals:
          kubernetes.annotations.autotrader/logging: 'true'
- drop_event:
    when:
      equals:
        kubernetes.container.name: 'istio-proxy'
- decode_json_fields:
    fields: ["message"]
    process_array: false
    max_depth: 1
    target: ""
    overwrite_keys: true

output.elasticsearch:
  hosts: ["<REDACTED>]
  protocol: 'https'
  username: '<REDACTED>'
  password: "${ELASTICSEARCH_PASSWORD}"
  index: "kubernetes-%{+yyyy.MM.dd}"
  bulk_max_size: 250

setup.template.enabled: false
setup.template.name: 'custom-beats'
setup.template.pattern: "beats-*"
setup.template.overwrite: true
setup.template.settings:
  index:
    number_of_shards: 2
    number_of_replicas: 1
    codec: best_compression

xpack.monitoring.enabled: true

We have a couple of custom modules that have been around for a long time.

For reference, the containers that have (What I suspect to be the culprit logs) are istio-init & istio-proxy. These have always existed and are always being harvested, albeit istio-proxy is dropped by the processor:

- drop_event:
    when:
      equals:
        kubernetes.container.name: 'istio-proxy'

The issue only appears to be present when adding the below to our nginx deployment (Which has istio-init init container, and istio-proxy sidecar):

co.elastic.logs/fileset.stderr: error
co.elastic.logs/fileset.stdout: access
co.elastic.logs/module: nginx

(This is using the built in nginx module).

I'm unsure why this makes a difference, since all container logs are harvested and processed by default, and are then dropped on the lack of an annotation, and this particular error looks to be at the initial harvesting stage.

I believe I can reproduce the issue so I'm happy to aid in any debugging.

Cheers


(Raman Gupta) #7

@Evesy, thanks for that investigation. I can confirm that in my case, I also have the error on my nginx ingress containers, and those containers also have the co.elastic.logs/fileset.stdout: access annotation. We do not have an istio sidecar, so at least in my case, the issue is not related to that.

I can also confirm that removing this annotation from those containers solves the problem -- no more CRI errors.

I believe the problem, therefore, is that that annotation is actually telling filebeat to read the entire log from that container as an access_log i.e. filebeat is not understanding that the log is still a Docker log that contains JSON, and that only the message log part of the JSON should be parsed as an access log.

With this additional information, I feel comfortable in creating an issue, and here it is: https://github.com/elastic/beats/issues/9768.


(Michael Eves) #8

@rocketraman

FWIW this also appears to be happening on nodes for us that don't have any pods with module/fileset annotations, so I'm not sure the issue is strictly related to using the fileset hints. I saw @exekias picked up the issue on GitHub so perhaps the root cause is already known at this point?

This is running on 6.4.0 (Downgrading from 6.5.3 appeared to initially resolve this issue).


(Alex Scoble) #9

We're also having this issue with 6.5.1 and 6.5.4


(system) closed #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.