The s3 input for creating the index does not work with preffix and csv files

Hi everyone, I have a bucekt s3 with 2 csv files, one that is a securityhub repot and the other a guardduty report, both AWS services and security.

I'm using the preffix to get my object, using the logstash -f file.conf --debug command, I see that it finds the securityhub.csv file but it skips the file and doesn't create the index.

Follow the command output:

Blockquote

DEBUG] 2024-01-05 23:15:22.286 [pool-3-thread-1] jvm - collector name {:name=>"ConcurrentMarkSweep"}
[DEBUG] 2024-01-05 23:15:22.564 [[main]<s3] s3 - Found key {:key=>"security_hub_results.csv"}
[DEBUG] 2024-01-05 23:15:22.571 [[main]<s3] s3 - Ignoring {:key=>"security_hub_results.csv"}
[DEBUG] 2024-01-05 23:15:23.641 [[main]<s3] s3 - Closing {:plugin=>"LogStash::Inputs::S3"}
[DEBUG] 2024-01-05 23:15:23.649 [[main]<s3] pluginmetadata - Removing metadata for plugin copernico-securityhub
[DEBUG] 2024-01-05 23:15:23.652 [[main]-pipeline-manager] javapipeline - Input plugins stopped! Will shutdown filter/output workers. {:pipeline_id=>"main", :thread=>"#<Thread:0x6e1d7b6d run>"}
[DEBUG] 2024-01-05 23:15:23.670 [[main]-pipeline-manager] javapipeline - Shutdown waiting for worker thread {:pipeline_id=>"main", :thread=>"#<LogStash::WorkerLoopThread:0x3a59b56f run>"}
[DEBUG] 2024-01-05 23:15:23.749 [[main]-pipeline-manager] javapipeline - Shutdown waiting for worker thread {:pipeline_id=>"main", :thread=>"#<LogStash::WorkerLoopThread:0x3405145f dead>"}
[DEBUG] 2024-01-05 23:15:23.755 [[main]-pipeline-manager] csv - Closing {:plugin=>"LogStash::Filters::CSV"}
[DEBUG] 2024-01-05 23:15:23.756 [[main]-pipeline-manager] pluginmetadata - Removing metadata for plugin eb8f5d7a915e774ee908322cb49b5311ddb4d0226fad4637788d9e1b34fe1466
[DEBUG] 2024-01-05 23:15:23.757 [[main]-pipeline-manager] stdout - Closing {:plugin=>"LogStash::Outputs::Stdout"}
[DEBUG] 2024-01-05 23:15:23.758 [[main]-pipeline-manager] pluginmetadata - Removing metadata for plugin e33fca5295dce67aa1bd189c267cd1c06e1766ee0d2faa12d1bbe10126075298
[DEBUG] 2024-01-05 23:15:23.762 [[main]-pipeline-manager] elasticsearch - Closing {:plugin=>"LogStash::Outputs::Elasticsearch"}
[DEBUG] 2024-01-05 23:15:23.782 [[main]-pipeline-manager] elasticsearch - Stopping sniffer
[DEBUG] 2024-01-05 23:15:23.784 [[main]-pipeline-manager] elasticsearch - Stopping resurrectionist
[DEBUG] 2024-01-05 23:15:24.456 [[main]-pipeline-manager] elasticsearch - Waiting for in use manticore connections
[DEBUG] 2024-01-05 23:15:24.465 [[main]-pipeline-manager] elasticsearch - Closing adapter #LogStash::Outputs::ElasticSearch::HttpClient::ManticoreAdapter:0x2151f761
[DEBUG] 2024-01-05 23:15:24.472 [[main]-pipeline-manager] PoolingHttpClientConnectionManager - Connection manager is shutting down
[DEBUG] 2024-01-05 23:15:24.473 [[main]-pipeline-manager] DefaultManagedHttpClientConnection - http-outgoing-0: Close connection
[DEBUG] 2024-01-05 23:15:24.473 [[main]-pipeline-manager] PoolingHttpClientConnectionManager - Connection manager shut down
[DEBUG] 2024-01-05 23:15:24.473 [[main]-pipeline-manager] pluginmetadata - Removing metadata for plugin 44c25b32a23622ecc5a22d3366fb3d2abf9d77c296a652a04e3897b03080cf2b
[DEBUG] 2024-01-05 23:15:24.476 [[main]-pipeline-manager] javapipeline - Pipeline has been shutdown {:pipeline_id=>"main", :thread=>"#<Thread:0x6e1d7b6d run>"}
[INFO ] 2024-01-05 23:15:24.487 [[main]-pipeline-manager] javapipeline - Pipeline terminated {"pipeline.id"=>"main"}
[DEBUG] 2024-01-05 23:15:24.525 [LogStash::Runner] agent - Shutting down all pipelines {:pipelines_count=>0}
[DEBUG] 2024-01-05 23:15:24.553 [LogStash::Runner] agent - Converging pipelines state {:actions_count=>1}
[DEBUG] 2024-01-05 23:15:24.571 [Converge PipelineAction::Delete] agent - Executing action {:action=>LogStash::PipelineAction::Delete/pipeline_id:main}
[INFO ] 2024-01-05 23:15:24.605 [Converge PipelineAction::Delete] pipelinesregistry - Removed pipeline from registry successfully {:pipeline_id=>:main}
[DEBUG] 2024-01-05 23:15:24.620 [LogStash::Runner] os - Stopping
[DEBUG] 2024-01-05 23:15:24.661 [LogStash::Runner] jvm - Stopping
[DEBUG] 2024-01-05 23:15:24.667 [LogStash::Runner] persistentqueue - Stopping
[DEBUG] 2024-01-05 23:15:24.667 [LogStash::Runner] deadletterqueue - Stopping
[DEBUG] 2024-01-05 23:15:24.756 [Api Webserver] agent - API WebServer has stopped running
[INFO ] 2024-01-05 23:15:24.757 [LogStash::Runner] runner - Logstash shut down.

Blockquote

follow my s3 input:

input {
  s3 {
    bucket => "s3-bucket"
    region => "sa-east-1"
    id => "my-securityhub"
    prefix => "security_hub_results.csv"
    exclude_pattern => "/finding\.csv$"
    role_arn => "myrole"
    # type => "s3"
    #sincedb_path => "/etc/logstash/sincedb/s3-sincedb"
    codec => "plain"
    #interval => 15
    watch_for_new_files => false
    additional_settings => {
      force_path_style => true
      follow_redirects => false
    }
  }
}

When I set the prefix to "" and even using exclude_pattern it creates the index but with the information from the 2 files in the root of the bucket. I've already done several tests and it's strange that the documentation talks about how to use the tool but it doesn't really work properly.

I've also tried removing the exclude_pattern and using only the prefix pointing to the file I want to use, which is security_hub_results.csv"

I hope someone can clarify my doubts about this.

Hi guys, I removed the .csv at the end of the file name and it worked.

prefix => "security_hub_results"

I find it strange because although .csv is the file format, in the s3 bucket the ket is: security_hub_results.csv

Taking advantage of the opportunity, I am creating this index in question from the command line.

I had separated 2 pipelines, one for each filed and conf, however, this one from security hub gives me an error regarding the type of the CreatedAt field.

When I run it manually, as was done just now, it creates the idnex without any problems, could anyone tell me the reason for this?

This is the error it returns:

"status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [CreatedAt] of type [date] in document with id 'arn:aws:securityhub :sa-east-1:accountid:security-control/EC2.19/finding/6230424d-8c59-40cc-9b33-dc9093123f9a'. Preview of field's value: 'CRITICAL'", "caused_by"=>{"type"= >"illegal_argument_exception", "reason"=>"failed to parse date field [CRITICAL] with format [strict_date_optional_time||epoch_millis]", "caused_by"=>{"type"=>"date_time_parse_exception", "reason"=>" Failed to parse with all enclosed parsers"}}}}}}

I tried doing several things, using filter to change the data type but nothing worked.

The prefix has to be a prefix, it cannot be the whole filename. See this test in the code.

This is a mapping error, it is returned by Elasticsearch.

It means that you are trying to index a field that has a value that is not supported by the current mapping.

In this case the field CreatedAt is mapped as date in Elasticsearch, but in this document it has the value CRITICAL , which of course is not a date and will be rejected, you need to check your parsing to see why this is happening.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.