Issue with Parquet File Retrieval from S3

Hello everyone,

I'm facing an issue while trying to retrieve a Parquet file from S3 using Filebeat. Below, I've included configuration details:

filebeat.inputs:
- type: aws-s3
  bucket_arn: ${BUCKET_ARN}
  bucket_list_prefix: ${BUCKET_LIST_PREFIX}
  bucket_list_interval: 60s
  region: eu-west-1
  default_region: eu-west-1
  number_of_workers: 5
  access_key_id: ${ACCESS_KEY_ID}
  secret_access_key: ${SECRET_ACCESS_KEY}
  decoding.codec.parquet.enabled: true
  decoding.codec.parquet.process_parallel: true
  decoding.codec.parquet.batch_size: 1000

setup.template.enabled: false

processors:
  - add_fields:
      target: '@metadata'
      fields:
        op_type: "index"

output.elasticsearch:
  hosts: ["${ELASTICSEARCH_HOSTS}"]
  username: ${ELASTICSEARCH_USERNAME}
  password: ${ELASTICSEARCH_PASSWORD}
  protocol: https
  index: utenti123
  allow_older_versions: true

I have tried various bucket_list_prefix solutions including:

emr-serverless/user-output/
emr-serverless/user--output//
emr-serverless/user-output/*
emr-serverless/user-output/*/

However, we consistently encounter the following error:

failed processing S3 event for object key "emr-serverless/user-output/" in bucket "root-content": failed to create parquet decoder: failed to create parquet reader: parquet: file too small (size=0)

Any insights or suggestions on troubleshooting steps would be highly appreciated. Please let me know if additional information is needed.

Thank you

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.