GCS Input not working as expected

Random_BB · June 19, 2024, 8:49am

I have an AWS Domain to which I am trying to Ingest Json codec files from GCS bucket using Logstash. I can see a log stating that my logstash has connected to domain and a few other logs stating it is trying to fetch logs from GCS bucket.

This is my logstash configuration.

input {
  google_cloud_storage {
    bucket_id => "test-bucket"
    json_key_file => "/etc/logstash/credentials.json"
    codec => "json_lines"
  }
}

filter {
}

output {
  opensearch {
    hosts => "https://<name>.us-east-1.es.amazonaws.com:443"
    user => "admin"
    password => "admin"
    index => "logstash-test-1"
    ssl_certificate_verification => true
  }
}

The same file works expected when the input is a file via ConfigMap. But when done via GCS, it doesn't push. I did see these logs.

[2024-06-19T08:28:41,025][INFO ][logstash.inputs.googlecloudstorage][main] ProcessedDb created in: /usr/share/logstash/data/plugins/inputs/google_cloud_storage/db
[2024-06-19T08:28:41,027][INFO ][logstash.inputs.googlecloudstorage][main] Turn on debugging to explain why blobs are filtered.
[2024-06-19T08:28:41,028][INFO ][logstash.javapipeline    ][main] Pipeline started {"pipeline.id"=>"main"}
[2024-06-19T08:28:41,033][INFO ][logstash.inputs.googlecloudstorage][main][6278fa388e5b5004f390348cab6962e1c49ff5ef2e012a1436a636cecb12a3c8] Fetching blobs from test-bucket
[2024-06-19T08:28:41,045][INFO ][logstash.agent] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}

I can see a log stating blobs are fetched, but it is not getting uploaded. Any reason why they are being filtered out? Using the official logstash docker image - docker.elastic.co/logstash/logstash:8.8.2 on which I have installed gcs input plugin.

Posting it here as I am using Elasticsearch logstash image.
Any help would be appreciated, I have been stuck here for days now.

system · June 19, 2024, 8:49am

OpenSearch/OpenDistro are AWS run products and differ from the original Elasticsearch and Kibana products that Elastic builds and maintains. You may need to contact them directly for further assistance. See What is OpenSearch and the OpenSearch Dashboard? | Elastic for more details.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns )

Random_BB · June 19, 2024, 8:50am

Removed #awses, #opensearch

Badger · June 19, 2024, 11:14am

This is very good advice. If you do it then the input will log whether each blob matches each of the conditions that could prevent it being processed.

leandrojmp · June 19, 2024, 12:22pm

What are the extension of the files in your bucket?

If you check the documentation you will see that the option file_matches per default will only match *.log and *.log.gz files, if your files have other extensions they will be filtered out.

The default pattern is this: .*\.log(\.gz)?

If your files are json files you may need to change this setting to something like:

file_matches => ".*\.json"

Random_BB · June 20, 2024, 6:30am

Files are just plain text files without any extension. Meaning, they are plain txt files with each line representing a single json object. But I missed the default part. Thanks for the help!
Let me see how to use them from here.

Random_BB · June 20, 2024, 7:17am

I added file_matches => ".*" so that it picks up all files and it worked. Thanks @leandrojmp

Topic		Replies	Views
Sqs input not working Logstash	1	399	February 12, 2020
S3 input with cloudtrail codec not working with gzipped files Logstash	3	2110	July 6, 2017
S3SNSSQS Plugin input Logstash ingest-pipeline	1	492	May 18, 2021
Index files from a folder in GCS bucket using logstash GCS plugin Logstash	1	240	September 1, 2021
Problem with S3 as Input Logstash	3	1195	July 6, 2017

GCS Input not working as expected

Related topics