Hello everyone,
I'm facing an issue while trying to retrieve a Parquet file from S3 using Filebeat. Below, I've included configuration details:
filebeat.inputs:
- type: aws-s3
bucket_arn: ${BUCKET_ARN}
bucket_list_prefix: ${BUCKET_LIST_PREFIX}
bucket_list_interval: 60s
region: eu-west-1
default_region: eu-west-1
number_of_workers: 5
access_key_id: ${ACCESS_KEY_ID}
secret_access_key: ${SECRET_ACCESS_KEY}
decoding.codec.parquet.enabled: true
decoding.codec.parquet.process_parallel: true
decoding.codec.parquet.batch_size: 1000
setup.template.enabled: false
processors:
- add_fields:
target: '@metadata'
fields:
op_type: "index"
output.elasticsearch:
hosts: ["${ELASTICSEARCH_HOSTS}"]
username: ${ELASTICSEARCH_USERNAME}
password: ${ELASTICSEARCH_PASSWORD}
protocol: https
index: utenti123
allow_older_versions: true
I have tried various bucket_list_prefix
solutions including:
emr-serverless/user-output/
emr-serverless/user--output//
emr-serverless/user-output/*
emr-serverless/user-output/*/
However, we consistently encounter the following error:
failed processing S3 event for object key "emr-serverless/user-output/" in bucket "root-content": failed to create parquet decoder: failed to create parquet reader: parquet: file too small (size=0)
Any insights or suggestions on troubleshooting steps would be highly appreciated. Please let me know if additional information is needed.
Thank you