Hello! I was hoping y'all could help me out.
The essence of my problem is that the filebeat S3 input plugin cannot process an s3 object who's content-type
is application/json
AND the object content is a separate json object per line (i.e. jsonl). Processing such an object used to be possible until v7.7.0 when the S3 input plugin started enforcing json parsing if it saw a content-type
of application/json
: https://github.com/elastic/beats/blob/5e69e25b920e3d93bec76a09a31da3ab35a55607/x-pack/filebeat/input/s3/input.go#L432
Before that, json processing was ONLY controlled by the expand_event_list_from_field
configuration.
Now, it's probably the case that content-type
on the s3 object should NOT be application/json
in the first place but I do not have control over that . I'm essentially dealing with the same problem as https://github.com/elastic/beats/issues/18696 but cloudflare is the entity pushing the logs to s3 (where for him it is AWS GuardDuty) and I don't have control over how cloudflare sets content-type
.
I reproduced this on the latest version of filebeat (I compiled locally):
filebeat version 8.0.0 (amd64), libbeat 8.0.0 [3341c1bca5626d1ee90af692617f10f58695ed1c built 2020-06-30 20:15:28 +0000 UTC]
. Here are the full steps with some info (like AWS account numbers) omitted:
$ cat s3filebeat.log
{"id": "0001", "hey": "there", "how": {"are": "you"}}
{"id": "0002", "hope": "you", "are": {"doing": "well"}}
{"id": "0003", "I": "am", "doing": {"O": "K"}}
$ gzip s3filebeat.log
$ aws --profile PROFILE s3api put-object --body ./s3filebeat.log.gz --bucket lucas-test-filebeat-s3 --content-encoding gzip --content-type application/json --key s3filebeat.log.gz
{
"ETag": "\"955ed9f01b6ee38dbba167daab9ebbbb\""
}
$ cat filebeat.yml
filebeat.inputs:
- type: s3
queue_url: "https://sqs.us-east-1.amazonaws.com/ACCTNUM/cloudflare_logs_dev"
role_arn: "arn:aws:iam::ACCTNUM:role/cloudflare_logs_filebeat_access_s3_sqs_dev"
output.console:
pretty: true
$ ./filebeat -e
2020-06-30T15:39:04.180-0500 INFO instance/beat.go:628 Home path: [/home/lgroenendaal/src/github.com/elastic/beats/x-pack/filebeat] Config path: [/home/lgroenendaal/src/github.com/elastic/beats/x-pack/filebeat] Data path: [/home/lgroenendaal/src/github.com/elastic/beats/x-pack/filebeat/data] Logs path: [/home/lgroenendaal/src/github.com/elastic/beats/x-pack/filebeat/logs]
2020-06-30T15:39:04.180-0500 INFO instance/beat.go:636 Beat ID: 2ef98d6a-ef7c-4885-a258-b4341a63b43b
2020-06-30T15:39:04.181-0500 INFO [seccomp] seccomp/seccomp.go:124 Syscall filter successfully installed
2020-06-30T15:39:04.181-0500 INFO [beat] instance/beat.go:964 Beat info {"system_info": {"beat": {"path": {"config": "/home/lgroenendaal/src/github.com/elastic/beats/x-pack/filebeat", "data": "/home/lgroenendaal/src/github.com/elastic/beats/x-pack/filebeat/data", "home": "/home/lgroenendaal/src/github.com/elastic/beats/x-pack/filebeat", "logs": "/home/lgroenendaal/src/github.com/elastic/beats/x-pack/filebeat/logs"}, "type": "filebeat", "uuid": "2ef98d6a-ef7c-4885-a258-b4341a63b43b"}}}
2020-06-30T15:39:04.181-0500 INFO [beat] instance/beat.go:973 Build info {"system_info": {"build": {"commit": "3341c1bca5626d1ee90af692617f10f58695ed1c", "libbeat": "8.0.0", "time": "2020-06-30T20:15:28.000Z", "version": "8.0.0"}}}
2020-06-30T15:39:04.181-0500 INFO [beat] instance/beat.go:976 Go runtime info {"system_info": {"go": {"os":"linux","arch":"amd64","max_procs":12,"version":"go1.13.3"}}}
2020-06-30T15:39:04.182-0500 INFO [beat] instance/beat.go:980 Host info {"system_info": {"host": {"architecture":"x86_64","boot_time":"2020-06-22T10:53:41-05:00","containerized":false,"name":"lgroenendaal-XPS-15-9570","ip":["127.0.0.1/8","::1/128","192.168.86.24/24","fe80::c38c:a654:ca3e:d0dd/64","10.20.211.109/32","fe80::d2ee:dd77:2943:aa45/64","172.17.0.1/16"],"kernel_version":"5.3.0-59-generic","mac":["9c:b6:d0:c6:01:39","02:42:e2:23:9b:84"],"os":{"family":"debian","platform":"ubuntu","name":"Ubuntu","version":"18.04.2 LTS (Bionic Beaver)","major":18,"minor":4,"patch":2,"codename":"bionic"},"timezone":"CDT","timezone_offset_sec":-18000,"id":"5a8843fa712d481595ebd41926cda45f"}}}
2020-06-30T15:39:04.183-0500 INFO [beat] instance/beat.go:1009 Process info {"system_info": {"process": {"capabilities": {"inheritable":null,"permitted":null,"effective":null,"bounding":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read"],"ambient":null}, "cwd": "/home/lgroenendaal/src/github.com/elastic/beats/x-pack/filebeat", "exe": "/home/lgroenendaal/src/github.com/elastic/beats/x-pack/filebeat/filebeat", "name": "filebeat", "pid": 5578, "ppid": 4156, "seccomp": {"mode":"filter","no_new_privs":true}, "start_time": "2020-06-30T15:39:03.640-0500"}}}
2020-06-30T15:39:04.183-0500 INFO instance/beat.go:298 Setup Beat: filebeat; Version: 8.0.0
2020-06-30T15:39:04.183-0500 INFO [publisher] pipeline/module.go:113 Beat name: lgroenendaal-XPS-15-9570
2020-06-30T15:39:04.184-0500 WARN beater/filebeat.go:151 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
2020-06-30T15:39:04.184-0500 INFO [monitoring] log/log.go:118 Starting metrics logging every 30s
2020-06-30T15:39:04.184-0500 INFO instance/beat.go:449 filebeat start running.
2020-06-30T15:39:04.184-0500 WARN beater/filebeat.go:251 Filebeat is unable to load the Ingest Node pipelines for the configured modules because the Elasticsearch output is not configured/enabled. If you have already loaded the Ingest Node pipelines or are using Logstash pipelines, you can ignore this warning.
2020-06-30T15:39:04.184-0500 INFO registrar/registrar.go:145 Loading registrar data from /home/lgroenendaal/src/github.com/elastic/beats/x-pack/filebeat/data/registry/filebeat/data.json
2020-06-30T15:39:04.184-0500 INFO registrar/registrar.go:152 States Loaded from registrar: 0
2020-06-30T15:39:04.184-0500 INFO [crawler] beater/crawler.go:71 Loading Inputs: 1
2020-06-30T15:39:04.185-0500 WARN [cfgwarn] s3/input.go:131 BETA: s3 input type is used
2020-06-30T15:39:04.185-0500 INFO [crawler] beater/crawler.go:141 Starting input (ID: 18222034013403473169)
2020-06-30T15:39:04.185-0500 INFO [crawler] beater/crawler.go:108 Loading and starting Inputs completed. Enabled inputs: 1
2020-06-30T15:39:04.186-0500 INFO [s3] s3/input.go:173 visibility timeout is set to 300 seconds
2020-06-30T15:39:04.186-0500 INFO [s3] s3/input.go:174 aws api timeout is set to 2m0s
2020-06-30T15:39:04.186-0500 INFO [s3] s3/input.go:196 s3 input worker has started. with queueURL: https://sqs.us-east-1.amazonaws.com/374144443638/cloudflare_logs_dev
2020-06-30T15:39:14.633-0500 ERROR [s3] s3/input.go:458 expand_event_list_from_field parameter is missing in config for application/json content-type file
2020-06-30T15:39:14.633-0500 ERROR [s3] s3/input.go:393 createEventsFromS3Info failed processing file from s3 bucket "lucas-test-filebeat-s3" with name "s3filebeat.log.gz": expand_event_list_from_field parameter is missing in config for application/json content-type file
The imporant error is the last line: s3/input.go:393 createEventsFromS3Info failed processing file from s3 bucket "lucas-test-filebeat-s3" with name "s3filebeat.log.gz": expand_event_list_from_field parameter is missing in config for application/json content-type file
.
If I try, just for fun, to include the expand_event_list_from_field
config it will, understandably, fail to parse and we'll get the WARN log: s3/input.go:542 decode json failed for 's3filebeat.log.gz' from S3 bucket 'lucas-test-filebeat-s3', skipping this file: json: cannot unmarshal string into Go value of type []interface {}
.
For the time being I'll probably use an older version of this plugin (unless you don't think this behavior will EVER be supported again in which case I'll have to do something else). Also, I'd be happy to create a GH issue if it helps!
Phew that was long, thanks for those who stuck with me,
Lucas