Logstash stops processing AWS WAF logs when fields exceed 1000 (or any number)

feo13 · August 23, 2023, 3:21pm

Hi there,

I'm ingesting AWS WAF logs and it works fine for a few minutes but then stops with the following error:

response=>{"index"=>{"_index"=>"waf-logs-2023.08.01", "_id"=>"rjCKGYoBsxYs-jwL007l",
"status"=>400, "error"=>{"type"=>"document_parsing_exception", 
"reason"=>"[1:6817] failed to parse: Limit of total fields [1000] has been 
exceeded while adding new fields [1]", 
"caused_by"=>{"type"=>"illegal_argument_exception", 
"reason"=>"Limit of total fields [1000] has been exceeded while adding 
new fields [1]"}}}}}

If I go into the Dev Console and increase the number of fields limit, it starts ingesting again but that will also eventually fail against whatever higher number was passed or it rolls to the next day's index:

PUT waf-logs-2023.08.01/_settings
{
  "index.mapping.total_fields.limit": 4000
}

Here's my logstash config:

input { 
  s3 { 
    "access_key_id" => "x"
    "secret_access_key" => "x"
    "region" => "us-east-1" 
    "bucket" => "mybucket" 
    "type" => "waf-log" 
    "interval" => "300" 
    "sincedb_path" => "/tmp/.waf-log_since.db" 
    "prefix" => "mybucket/2023/08"
  } 
} 

filter { 
  if [type] == "waf-log" { 
   json { 
        source => "message" 
  } 
  date { 
        match => [ "[timestamp]", "UNIX_MS" ] 
  } 
  geoip { 
        source => [ "[httpRequest][clientIp]" ] 
        target => geoip 
  } 
  ruby { 
    code => ' 
      event.get("[httpRequest][headers]").each { |kv| 
        event.set(name = kv["name"], value = kv["value"])} 
        ' 
  } 
} 
} 

output { 
  elasticsearch { 
    hosts => ["http://127.0.0.1:9200/"] 
    index => "waf-logs-%{+YYYY.MM.dd}"
  } 
}

What is the best approach to handle this?

Thank you!

Badger · August 23, 2023, 3:25pm

I would start by looking at the existing mapping on the index and see if there are any obvious issues.

leandrojmp · August 23, 2023, 3:28pm

I would assume that you can have a wide range of fields in the httpRequest.headers object and this ruby code would create a field for each one of them, which is not recommended.

The best approach would be to store the httpRequest.headers object as a flattened field, this way the entire json will be stored, but you will have only one field mapped.

This is the recommendation for the flattened data type.

This data type can be useful for indexing objects with a large or unknown number of unique keys. Only one field mapping is created for the whole JSON object, which can help prevent a mappings explosion from having too many distinct field mappings.

feo13 · August 23, 2023, 4:04pm

Thank you very much for the reply! Should I just removed the ruby bit in the config?

Thanks again!

leandrojmp · August 23, 2023, 4:41pm

You need to check if this is really the cause of the excessive number of fields.

Assuming that the quantity of headers can be large and unknown, I would say that this is a probably cause, but you need to validate it.

Then you would need to create a mapping for the httpRequest.headers field as flattened and create a new indice.

system · September 20, 2023, 4:41pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Limit of total fields [num] in index [logstash-XX] has been exceeded Logstash	7	687	November 27, 2019
Regarding to logstash performance limit Logstash	4	401	October 8, 2019
How to found problematic messages? Logstash	2	383	March 4, 2018
Cloudtrail logs have too many fields Elasticsearch	1	449	April 23, 2020
reason"=>"Limit of total fields [1000] in index Logstash	4	440	June 23, 2020

Logstash stops processing AWS WAF logs when fields exceed 1000 (or any number)

Related topics