Filebeat multiline config doesn't work with S3 input

Elastic Cloud hosting
Elasticsearch 7.10.0
Filebeat 7.10.1

We've been unable to get files from S3 input to successfully apply the configured multiline options on that input. Despite our best efforts, lines that should be consolidated as a single event are being sent into Elasticsearch individually.

S3 input section of filebeat.yml

    filebeat.inputs:
      - type: s3
        queue_url: "${QUEUE_URL}"
        multiline.type: pattern
        multiline.pattern: '^\d{4}-\d{2}-\d{2}'
        multiline.negate: true
        multiline.match: after

When testing Filebeat locally using basic Log input and File output, the same multiline configuration appears to be working as expected.

filebeat.yml

    filebeat.inputs:
    - type: log
      paths:
        - /PATH/local_filebeat_logs/*
      multiline.type: pattern
      multiline.pattern: '^\d{4}-\d{2}-\d{2}'
      multiline.negate: true
      multiline.match: after

Sample input file

    2020-12-28 11:10:19,800 INFO com.amazon.elasticmapreduce.s3distcp.S3DistCp (main): S3DistCp args: --s3Endpoint=s3.us-east-1.amazonaws.com --src=hdfs:///date=20201228/hour=09 --dest=s3://BUCKET/PATH/date=20201228/hour=09/1609171752178 
    2020-12-28 11:10:44,825 INFO org.apache.hadoop.mapreduce.Job (main): Job job_1609171623625_0002 completed successfully
    2020-12-28 11:10:44,919 INFO org.apache.hadoop.mapreduce.Job (main): Counters: 54
        File System Counters
            FILE: Number of bytes read=1935
            FILE: Number of bytes written=1419609
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=3425
            HDFS: Number of bytes written=0
            HDFS: Number of read operations=42
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=14
            S3: Number of bytes read=0
            S3: Number of bytes written=296
            S3: Number of read operations=0
            S3: Number of large read operations=0
            S3: Number of write operations=0
        Job Counters 
            Launched map tasks=1
            Launched reduce tasks=7
            Rack-local map tasks=1
            Total time spent by all maps in occupied slots (ms)=192576
            Total time spent by all reduces in occupied slots (ms)=7959168
            Total time spent by all map tasks (ms)=2006
            Total time spent by all reduce tasks (ms)=41454
            Total vcore-milliseconds taken by all map tasks=2006
            Total vcore-milliseconds taken by all reduce tasks=41454
            Total megabyte-milliseconds taken by all map tasks=6162432
            Total megabyte-milliseconds taken by all reduce tasks=254693376
        Map-Reduce Framework
            Map input records=9
            Map output records=9
            Map output bytes=3401
            Map output materialized bytes=1907
            Input split bytes=154
            Combine input records=0
            Combine output records=0
            Reduce input groups=9
            Reduce shuffle bytes=1907
            Reduce input records=9
            Reduce output records=0
            Spilled Records=18
            Shuffled Maps =7
            Failed Shuffles=0
            Merged Map outputs=7
            GC time elapsed (ms)=1440
            CPU time spent (ms)=42960
            Physical memory (bytes) snapshot=3794964480
            Virtual memory (bytes) snapshot=55973384192
            Total committed heap usage (bytes)=5106565120
        Shuffle Errors
            BAD_ID=0
            CONNECTION=0
            IO_ERROR=0
            WRONG_LENGTH=0
            WRONG_MAP=0
            WRONG_REDUCE=0
        File Input Format Counters 
            Bytes Read=2975
        File Output Format Counters 
            Bytes Written=0
    2020-12-28 11:10:44,920 INFO com.amazon.elasticmapreduce.s3distcp.S3DistCp (main): Try to recursively delete hdfs:/tmp/2c6f4478-befb-49ad-babe-f9f2e8e4f6e0

The end result of this sample file should be 4 events in Elasticsearch.

The only other difference I can think to mention is that the sample input log file is gzipped (.gz) in S3. The S3 input clearly has no trouble decompressing lines from the .gz but it still isn't correctly applying the multiline configuration as desired.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.