Logstash stops processing AWS WAF logs when fields exceed 1000 (or any number)

Hi there,

I'm ingesting AWS WAF logs and it works fine for a few minutes but then stops with the following error:

response=>{"index"=>{"_index"=>"waf-logs-2023.08.01", "_id"=>"rjCKGYoBsxYs-jwL007l",
"status"=>400, "error"=>{"type"=>"document_parsing_exception", 
"reason"=>"[1:6817] failed to parse: Limit of total fields [1000] has been 
exceeded while adding new fields [1]", 
"reason"=>"Limit of total fields [1000] has been exceeded while adding 
new fields [1]"}}}}}

If I go into the Dev Console and increase the number of fields limit, it starts ingesting again but that will also eventually fail against whatever higher number was passed or it rolls to the next day's index:

PUT waf-logs-2023.08.01/_settings
  "index.mapping.total_fields.limit": 4000

Here's my logstash config:

input { 
  s3 { 
    "access_key_id" => "x"
    "secret_access_key" => "x"
    "region" => "us-east-1" 
    "bucket" => "mybucket" 
    "type" => "waf-log" 
    "interval" => "300" 
    "sincedb_path" => "/tmp/.waf-log_since.db" 
    "prefix" => "mybucket/2023/08"

filter { 
  if [type] == "waf-log" { 
   json { 
        source => "message" 
  date { 
        match => [ "[timestamp]", "UNIX_MS" ] 
  geoip { 
        source => [ "[httpRequest][clientIp]" ] 
        target => geoip 
  ruby { 
    code => ' 
      event.get("[httpRequest][headers]").each { |kv| 
        event.set(name = kv["name"], value = kv["value"])} 

output { 
  elasticsearch { 
    hosts => [""] 
    index => "waf-logs-%{+YYYY.MM.dd}"

What is the best approach to handle this?

Thank you!

I would start by looking at the existing mapping on the index and see if there are any obvious issues.

I would assume that you can have a wide range of fields in the httpRequest.headers object and this ruby code would create a field for each one of them, which is not recommended.

The best approach would be to store the httpRequest.headers object as a flattened field, this way the entire json will be stored, but you will have only one field mapped.

This is the recommendation for the flattened data type.

This data type can be useful for indexing objects with a large or unknown number of unique keys. Only one field mapping is created for the whole JSON object, which can help prevent a mappings explosion from having too many distinct field mappings.

Thank you very much for the reply! Should I just removed the ruby bit in the config?

Thanks again!

You need to check if this is really the cause of the excessive number of fields.

Assuming that the quantity of headers can be large and unknown, I would say that this is a probably cause, but you need to validate it.

Then you would need to create a mapping for the httpRequest.headers field as flattened and create a new indice.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.