Files too big for Sentinel plugin

Joseph_Leiber · December 4, 2023, 8:37am

Hi Logstash Experts -

This is my first time dealing with Logstash, so I'm not quite sure why the logs are being formatted like this, whether this is expected/normal, or how to handle them.

I'm hitting an issue with logs from Artifactory -> Fluent Bit -> Logstash. The main issue is that the log files are too big to pass to the Azure Sentinel plugin - I'm getting the error

[ERROR][logstash.outputs.microsoftsentineloutput][artifactory][...] Received document above the max allowed size - dropping the document [document size: 3920975, max allowed size: 1036576

Is there some way to split the logs into smaller files? Is this possible with Logstash, or does it need to be done before the logs are sent from Fluent Bit?

My config is very simple:

input {
  tcp {
    port => 8084
  }
}

output {
    s3 {
        region => "ap-northeast-1"
        bucket => "MY_BUCKET"       
        prefix => "artifactory-logs/%{+YYYY}/%{+MM}/%{+dd}"
        time_file => 5
        additional_settings => {
            "force_path_style" => true
            "follow_redirects" => false
        }
        codec => json
    }

    file {
      path => "/var/log/artifactory_test.log"
      write_behavior => "overwrite"
    }

    microsoft-sentinel-logstash-output-plugin {
      client_app_Id => MY_ID
      client_app_secret => MY_SECRET
      tenant_id => MY_TENANT_ID
      data_collection_endpoint => MY_ENDPOINT
      dcr_immutable_id => MY_DCR
      dcr_stream_name => MY_STREAM
    }
}

Each log file contains tons of entries, and they're grouped together in an odd way, with square brackets and no delimiters. Each pair of square brackets contains an arbitrary number of log entries separated by curly braces and commas. All quotation marks are escaped.

[{"message":"dummy data","purpose":"testing","log_server":"splunk"},{"message":"dummy data","purpose":"testing","log_server":"splunk"},{"message":"dummy data","purpose":"testing","log_server":"splunk"},{"message":"dummy data","purpose":"testing","log_server":"splunk"}]
[{"message":"dummy data","purpose":"testing","log_server":"splunk"}]

Joseph_Leiber · December 13, 2023, 11:41pm

This is resolved - turned out the issue was that Fluent Bit was sending us logs in json format. We fixed the issue by changing the format to json_lines.

system · January 10, 2024, 11:41pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.