The s3 input for creating the index does not work with preffix and csv files

Marcos_Daniel_Santos · January 5, 2024, 11:29pm

Hi everyone, I have a bucekt s3 with 2 csv files, one that is a securityhub repot and the other a guardduty report, both AWS services and security.

I'm using the preffix to get my object, using the logstash -f file.conf --debug command, I see that it finds the securityhub.csv file but it skips the file and doesn't create the index.

Follow the command output:

Blockquote

DEBUG] 2024-01-05 23:15:22.286 [pool-3-thread-1] jvm - collector name {:name=>"ConcurrentMarkSweep"}
[DEBUG] 2024-01-05 23:15:22.564 [[main]<s3] s3 - Found key {:key=>"security_hub_results.csv"}
[DEBUG] 2024-01-05 23:15:22.571 [[main]<s3] s3 - Ignoring {:key=>"security_hub_results.csv"}
[DEBUG] 2024-01-05 23:15:23.641 [[main]<s3] s3 - Closing {:plugin=>"LogStash::Inputs::S3"}
[DEBUG] 2024-01-05 23:15:23.649 [[main]<s3] pluginmetadata - Removing metadata for plugin copernico-securityhub
[DEBUG] 2024-01-05 23:15:23.652 [[main]-pipeline-manager] javapipeline - Input plugins stopped! Will shutdown filter/output workers. {:pipeline_id=>"main", :thread=>"#<Thread:0x6e1d7b6d run>"}
[DEBUG] 2024-01-05 23:15:23.670 [[main]-pipeline-manager] javapipeline - Shutdown waiting for worker thread {:pipeline_id=>"main", :thread=>"#<LogStash::WorkerLoopThread:0x3a59b56f run>"}
[DEBUG] 2024-01-05 23:15:23.749 [[main]-pipeline-manager] javapipeline - Shutdown waiting for worker thread {:pipeline_id=>"main", :thread=>"#<LogStash::WorkerLoopThread:0x3405145f dead>"}
[DEBUG] 2024-01-05 23:15:23.755 [[main]-pipeline-manager] csv - Closing {:plugin=>"LogStash::Filters::CSV"}
[DEBUG] 2024-01-05 23:15:23.756 [[main]-pipeline-manager] pluginmetadata - Removing metadata for plugin eb8f5d7a915e774ee908322cb49b5311ddb4d0226fad4637788d9e1b34fe1466
[DEBUG] 2024-01-05 23:15:23.757 [[main]-pipeline-manager] stdout - Closing {:plugin=>"LogStash::Outputs::Stdout"}
[DEBUG] 2024-01-05 23:15:23.758 [[main]-pipeline-manager] pluginmetadata - Removing metadata for plugin e33fca5295dce67aa1bd189c267cd1c06e1766ee0d2faa12d1bbe10126075298
[DEBUG] 2024-01-05 23:15:23.762 [[main]-pipeline-manager] elasticsearch - Closing {:plugin=>"LogStash::Outputs::Elasticsearch"}
[DEBUG] 2024-01-05 23:15:23.782 [[main]-pipeline-manager] elasticsearch - Stopping sniffer
[DEBUG] 2024-01-05 23:15:23.784 [[main]-pipeline-manager] elasticsearch - Stopping resurrectionist
[DEBUG] 2024-01-05 23:15:24.456 [[main]-pipeline-manager] elasticsearch - Waiting for in use manticore connections
[DEBUG] 2024-01-05 23:15:24.465 [[main]-pipeline-manager] elasticsearch - Closing adapter #LogStash::Outputs::ElasticSearch::HttpClient::ManticoreAdapter:0x2151f761
[DEBUG] 2024-01-05 23:15:24.472 [[main]-pipeline-manager] PoolingHttpClientConnectionManager - Connection manager is shutting down
[DEBUG] 2024-01-05 23:15:24.473 [[main]-pipeline-manager] DefaultManagedHttpClientConnection - http-outgoing-0: Close connection
[DEBUG] 2024-01-05 23:15:24.473 [[main]-pipeline-manager] PoolingHttpClientConnectionManager - Connection manager shut down
[DEBUG] 2024-01-05 23:15:24.473 [[main]-pipeline-manager] pluginmetadata - Removing metadata for plugin 44c25b32a23622ecc5a22d3366fb3d2abf9d77c296a652a04e3897b03080cf2b
[DEBUG] 2024-01-05 23:15:24.476 [[main]-pipeline-manager] javapipeline - Pipeline has been shutdown {:pipeline_id=>"main", :thread=>"#<Thread:0x6e1d7b6d run>"}
[INFO ] 2024-01-05 23:15:24.487 [[main]-pipeline-manager] javapipeline - Pipeline terminated {"pipeline.id"=>"main"}
[DEBUG] 2024-01-05 23:15:24.525 [LogStash::Runner] agent - Shutting down all pipelines {:pipelines_count=>0}
[DEBUG] 2024-01-05 23:15:24.553 [LogStash::Runner] agent - Converging pipelines state {:actions_count=>1}
[DEBUG] 2024-01-05 23:15:24.571 [Converge PipelineAction::Delete] agent - Executing action {:action=>LogStash::PipelineAction::Delete/pipeline_id:main}
[INFO ] 2024-01-05 23:15:24.605 [Converge PipelineAction::Delete] pipelinesregistry - Removed pipeline from registry successfully {:pipeline_id=>:main}
[DEBUG] 2024-01-05 23:15:24.620 [LogStash::Runner] os - Stopping
[DEBUG] 2024-01-05 23:15:24.661 [LogStash::Runner] jvm - Stopping
[DEBUG] 2024-01-05 23:15:24.667 [LogStash::Runner] persistentqueue - Stopping
[DEBUG] 2024-01-05 23:15:24.667 [LogStash::Runner] deadletterqueue - Stopping
[DEBUG] 2024-01-05 23:15:24.756 [Api Webserver] agent - API WebServer has stopped running
[INFO ] 2024-01-05 23:15:24.757 [LogStash::Runner] runner - Logstash shut down.

Blockquote

follow my s3 input:

input {
  s3 {
    bucket => "s3-bucket"
    region => "sa-east-1"
    id => "my-securityhub"
    prefix => "security_hub_results.csv"
    exclude_pattern => "/finding\.csv$"
    role_arn => "myrole"
    # type => "s3"
    #sincedb_path => "/etc/logstash/sincedb/s3-sincedb"
    codec => "plain"
    #interval => 15
    watch_for_new_files => false
    additional_settings => {
      force_path_style => true
      follow_redirects => false
    }
  }
}

When I set the prefix to "" and even using exclude_pattern it creates the index but with the information from the 2 files in the root of the bucket. I've already done several tests and it's strange that the documentation talks about how to use the tool but it doesn't really work properly.

I've also tried removing the exclude_pattern and using only the prefix pointing to the file I want to use, which is security_hub_results.csv"

I hope someone can clarify my doubts about this.

Marcos_Daniel_Santos · January 5, 2024, 11:39pm

Hi guys, I removed the .csv at the end of the file name and it worked.

prefix => "security_hub_results"

I find it strange because although .csv is the file format, in the s3 bucket the ket is: security_hub_results.csv

Taking advantage of the opportunity, I am creating this index in question from the command line.

I had separated 2 pipelines, one for each filed and conf, however, this one from security hub gives me an error regarding the type of the CreatedAt field.

When I run it manually, as was done just now, it creates the idnex without any problems, could anyone tell me the reason for this?

This is the error it returns:

"status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [CreatedAt] of type [date] in document with id 'arn:aws:securityhub :sa-east-1:accountid:security-control/EC2.19/finding/6230424d-8c59-40cc-9b33-dc9093123f9a'. Preview of field's value: 'CRITICAL'", "caused_by"=>{"type"= >"illegal_argument_exception", "reason"=>"failed to parse date field [CRITICAL] with format [strict_date_optional_time||epoch_millis]", "caused_by"=>{"type"=>"date_time_parse_exception", "reason"=>" Failed to parse with all enclosed parsers"}}}}}}

I tried doing several things, using filter to change the data type but nothing worked.

Badger · January 6, 2024, 3:15am

The prefix has to be a prefix, it cannot be the whole filename. See this test in the code.

leandrojmp · January 6, 2024, 12:33pm

This is a mapping error, it is returned by Elasticsearch.

It means that you are trying to index a field that has a value that is not supported by the current mapping.

In this case the field CreatedAt is mapped as date in Elasticsearch, but in this document it has the value CRITICAL , which of course is not a date and will be rejected, you need to check your parsing to see why this is happening.

system · February 3, 2024, 12:33pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Not able to create an indexes for files stored in S3 Logstash	11	1251	July 10, 2018
Input S3 does NOT work properly with prefix option Logstash	3	1872	September 23, 2019
Logstash s3 input plugin: not indexing files under individual folder of bucket using prefix option Logstash	1	305	December 21, 2020
Logstash S3 input plugin - prefix usage Logstash	1	442	December 16, 2020
How to capture text from a path or s3 prefix Logstash	12	3359	September 21, 2017

The s3 input for creating the index does not work with preffix and csv files

Related topics