Handle json filter field name cannot be an empty string error

austin0918 · May 17, 2022, 4:45pm

I have some Proofpoint logs in JSON format likes...

{
"metadata":{
	"tzOffsetMins":-240,
	"tsEpochMs":1650294078380
},
"ts":"2022-05-17T01:02:10.380410-0400",
"guid":"T1sZJIJ-YGTdBDzraMXIw6rea9pB78Qw",
"msgParts":[
	{
	"structureId":"0",
	"metadata":{
		"title":"GRAINGER",
		"viewport":"width=device-width, initial-scale=1.0 ",
		"":"Grainger"
	},
	"detectedName":"text.html",
	"disposition":"inline"
	}
],
"action_dmarc":[
]
}

since there is a key in [msgParts][metadata] that does not have a name, the log won't be able to index to Elasticsearch due to "field name cannot be an empty string" error.

I wanted to update the key with a new name and tried following the solution mentioned in the thread but it looks like I messed up some configs and didn't get the expected results.

Logstash config is like...

filter {
  ruby {
    code  => '
		h = event.get("[msgParts][metadata]")
		newh = {}
		h.each { |k, v|
			if k == ""
				newk = "extensionless"
			else
				newk = k
			end
			newh[newk] = v
		}
		event.set("[msgParts][metadata]", newh)
	'
  }
  json {
    source => "message"
  }
}

Also tried source => "event" in json filter but get _rubyexception, _jsonparsefailure tags.

Could anyone please take a look and advise? Logstash version is 8.1.3.

Badger · May 17, 2022, 5:12pm

Two problems. Firstly, the json filter has to come before the ruby filter, otherwise the [msgParts] field does not exist when the ruby filter processes the event. Secondly, [msgParts] is an array so both references should be

"[msgParts][0][metadata]"

Once you fix that it works

    "msgParts" => [
    [0] {
        "detectedName" => "text.html",
            "metadata" => {
            "extensionless" => "Grainger",
                 "viewport" => "width=device-width, initial-scale=1.0 ",

BTW, I now think it is less ugly to use

            newk = k == "" ? "extensionless" : k
            newh[newk] = v

austin0918 · May 17, 2022, 8:38pm

It works perfectly. Thank you, Badger!!! But I just found out [metadata] is not only present in the first item of msgParts. So I also need to loop the entire array and test the existence of the [metadata] field and then do your magic. Could you advise? I have zero experience with ruby and below is where I can go at the moment. Thank you again!!

  ruby {
    code  => '
		m = event.get("[msgParts]")
		m.each_with_index  {
			if [metadata] {
				h = event.get("[metadata]")
				newh = {}
				h.each { |k, v|
					newk = k == "" ? "extensionless" : k
					newh[newk] = v
				}
				event.set("[metadata]", newh)
			}
		}
	'
  }

Badger · May 17, 2022, 9:17pm

If your data structure is

{
"guid":"T1sZJIJ-YGTdBDzraMXIw6rea9pB78Qw",
"msgParts":[
    { "structureId":"0", "metadata":{ "title":"Foo", "":"Foo" } },
    { "structureId":"1", "metadata":{ "title":"Bar", "":"Bar" } }
],
"action_dmarc":[ ]
}

and you want to iterate over the array then I would rewrite the filter as

  ruby {
    code  => '
        a = event.get("[msgParts]")
        a.each_index { |x|
            a[x]["metadata"]["extensionless"] = a[x]["metadata"].delete("")
        }
        event.set("[msgParts]", a)
    '
  }

austin0918 · May 18, 2022, 3:41am

It works like a charm. Thank you Badger!!!

Here is the final config for parsing Proofpoint historical log data extraction, in case anyone else needs it.

filter {
  json {
    source => "message"
  }
  ruby {
    code  => '
        a = event.get("[msgParts]")
        a.each_index { |x|
            a[x]["metadata"]["extensionless"] = a[x]["metadata"].delete("")
        }
        event.set("[msgParts]", a)
    '
  }
  date {
   match => ["ts","YYYY-MM-dd'T'HH:mm:ss.SSSZ","ISO8601"]
   target => "@timestamp"
  }
}

austin0918 · May 20, 2022, 2:51am

The config works, but I see a lot errors in logstash log. Please advise.

[2022-05-19T22:32:21,992][ERROR][logstash.filters.ruby ][filebeat-pf][8fca63a10259496bbfe146da14e553c2fb50c57a2ccb359db4a8a23d22d17a12] Ruby exception occurred: undefined method each_index' for nil:NilClass {:class=>"NoMethodError", :backtrace=>["(ruby filter code):4:in block in filter_method'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-filter-ruby-3.1.8/lib/logstash/filters/ruby.rb:96:in inline_script'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-filter-ruby-3.1.8/lib/logstash/filters/ruby.rb:89:in filter'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:159:in do_filter'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:178:in block in multi_filter'", "org/jruby/RubyArray.java:1821:in each'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:175:in multi_filter'", "org/logstash/config/ir/compiler/AbstractFilterDelegatorExt.java:134:in multi_filter'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:299:in block in start_workers'"]}

Badger · May 20, 2022, 3:43am

Some events do not have a [msgParts] field. Use a conditional in the ruby filter

    a = event.get("[msgParts]")
    if a
        a.each_index { |x|
            a[x]["metadata"]["extensionless"] = a[x]["metadata"].delete("")
        }
        event.set("[msgParts]", a)
    end

austin0918 · May 20, 2022, 8:51pm

It works. Thank you bro.

austin0918 · May 21, 2022, 1:17am

Have a new issue, bro. The name of some fields contains special characters like,

 "metadata": {"codecname": "MPEG-4 Audio", "xap/1.0/mm/xmpmm:pantry[2]/xmptpg:npages": "1", "xap/1.0/mm/xmpmm:history[9]/stevt:changed": "/", "createdate": "2022-04-27T14:56:41+01:00", "xap/1.0/mm/xmpmm:pantry[52]/tiff:make": "Canon", "xap/1.0/mm/xmpmm:pantry[4]/xmptpg:npages": "1", "codec": "mp4a", "rating": "0", "xap/1.0/mm/xmpmm:history[6]/stevt:changed": "/", "framewidth (pixels)": 1920, "datarate (kbps)": 1495, "thumbnails[1]/xmpgimg:width": "256", "dc/elements/1.1/dc:format": "H.264", "filetype": "mp42", "thumbnails[1]/xmpgimg:format": "JPEG", "minutes": 0, "xap/1.0/mm/xmpmm:pantry[1]/xmptpg:npages": "1", "audiosamplerate (khz)": 48, "audiobitrate (kbps)": 317, "filetypedeveloper": "ISO", "streamtype": "audio", "encrypted": 0, "hours": 0, "modifydate": "2022-04-27T14:57:12+01:00", "metadatadate": "2022-04-27T14:57:12+01:00", "thumbnails[1]/xmpgimg:height": "72", "xap/1.0/mm/xmpmm:pantry[3]/xmptpg:npages": "1", "length (hh:mm:ss)": "00:00:44", "xap/1.0/mm/xmpmm:pantry[28]/dc:title[1]": "5L ", "channels": 2, "frameheight (pixels)": 1080, "framerate (frames/sec)": 30, "seconds": 44, "tiff/1.0/tiff:orientation": "1", "streamid": "2", "xap/1.0/mm/xmpmm:history[1]/stevt:changed": "/", "xap/1.0/mm/xmpmm:pantry[52]/aux:lensid": "198"}

This causes <RuntimeError: Invalid FieldReference: xap/1.0/mm/xmpmm:pantry[2]/xmptpg:npages> error.

How should I handle this field name? I also tried to leave it as a _jsonparsefailure and keep the original message, but my current config failed to parse the ts field. Could you advise?

filter {
	json {
		source => "message"
	}
	ruby {
		code  => '
			a = event.get("[msgParts]")
			if a
				a.each_index { |x|
					if a[x]["metadata"]
						a[x]["metadata"]["extensionless"] = a[x]["metadata"].delete("")
					end
				}
				event.set("[msgParts]", a)
			end
		'
	}
	date {
		match => ["ts","YYYY-MM-dd'T'HH:mm:ss.SSSZ","ISO8601"]
		target => "@timestamp"
	}
	if "_jsonparsefailure" in [tags] {
		mutate {
			add_field => { "ts1" => "" }
        }
		ruby {
			code => '
				t = event.get("[ts]")
				event.set("[ts1]", t)
			'
		}
		date {
			match => ["ts","YYYY-MM-dd'T'HH:mm:ss.SSSZ","ISO8601"]
			target => "@timestamp"
		}
	
	}
	if  "_jsonparsefailure" not in [tags] {
		mutate { remove_field => ["[event][original]","[message]"] }
	}
}

Below is a sample log.

{"guid": "6-WyL_R332XqoIateyxm41pivkPL8ohQ", "msgParts": [{"disposition": "attached", "sizeDecodedBytes": 10184246, "isDeleted": false, "labeledCharset": "", "sandboxStatus": "NOT_SUPPORTED", "detectedExt": "MP4", "md5": "21551aaa91f496f9757b774d07c68af9", "detectedSizeBytes": 10184246, "isVirtual": false, "isCorrupted": false, "detectedCharset": "", "labeledExt": "mp4", "sha256": "7a6fd506e9fec42c7c0ca7b0a4067e64b0858e25bea88f13bb889e3a4c65ee50", "detectedMime": "video/mp4", "labeledMime": "video/mp4", "structureId": "0", "detectedName": "eco refill movie_1.mp4", "isProtected": false, "labeledName": "eco refill movie_1.mp4", "isTimedOut": false, "metadata": {"codecname": "MPEG-4 Audio", "xap/1.0/mm/xmpmm:pantry[2]/xmptpg:npages": "1", "xap/1.0/mm/xmpmm:history[9]/stevt:changed": "/", "createdate": "2022-04-27T14:56:41+01:00", "xap/1.0/mm/xmpmm:pantry[52]/tiff:make": "Canon", "xap/1.0/mm/xmpmm:pantry[4]/xmptpg:npages": "1", "codec": "mp4a", "rating": "0", "xap/1.0/mm/xmpmm:history[6]/stevt:changed": "/", "framewidth (pixels)": 1920, "datarate (kbps)": 1495, "thumbnails[1]/xmpgimg:width": "256", "dc/elements/1.1/dc:format": "H.264", "filetype": "mp42", "thumbnails[1]/xmpgimg:format": "JPEG", "minutes": 0, "xap/1.0/mm/xmpmm:pantry[1]/xmptpg:npages": "1", "audiosamplerate (khz)": 48, "audiobitrate (kbps)": 317, "filetypedeveloper": "ISO", "streamtype": "audio", "encrypted": 0, "hours": 0, "modifydate": "2022-04-27T14:57:12+01:00", "metadatadate": "2022-04-27T14:57:12+01:00", "thumbnails[1]/xmpgimg:height": "72", "xap/1.0/mm/xmpmm:pantry[3]/xmptpg:npages": "1", "length (hh:mm:ss)": "00:00:44", "xap/1.0/mm/xmpmm:pantry[28]/dc:title[1]": "5L ", "channels": 2, "frameheight (pixels)": 1080, "framerate (frames/sec)": 30, "seconds": 44, "tiff/1.0/tiff:orientation": "1", "streamid": "2", "xap/1.0/mm/xmpmm:history[1]/stevt:changed": "/", "xap/1.0/mm/xmpmm:pantry[52]/aux:lensid": "198"}, "textExtracted": "U0NBTEFSKDB4N2YwNzEwMmZhOTMwKQ==\n", "urls": [], "dataBase64": "U0NBTEFSKDB4N2YwN2E2YjM0ZWEwKQ==\n", "isArchive": false}, {"sandboxStatus": "NOT_SUPPORTED", "detectedExt": "JPG", "md5": "32825d23f13ae618fe0802801690274b", "disposition": "attached", "sizeDecodedBytes": 1166579, "isDeleted": false, "labeledCharset": "", "isCorrupted": false, "isVirtual": false, "detectedCharset": "", "detectedSizeBytes": 1166579, "detectedName": "eco refill 3840x1098px-02.jpg", "structureId": "0", "labeledName": "eco refill 3840x1098px-02.jpg", "isProtected": false, "isTimedOut": false, "metadata": {}, "sha256": "cc5c09c4ea143d2a0f03d20ff0edc59f8a3e34f56594105b25e6b124c5b2d7a1", "labeledExt": "jpg", "detectedMime": "image/jpeg", "labeledMime": "image/jpeg", "urls": [], "dataBase64": "U0NBTEFSKDB4N2YwNzEwNTQ1MzAwKQ==\n", "isArchive": false, "textExtracted": "U0NBTEFSKDB4N2YwNzc2NmM4MWY4KQ==\n"}, {"detectedCharset": "", "isVirtual": false, "isCorrupted": false, "detectedSizeBytes": 2363187, "md5": "6d19a5dee9ca5b383f488fbe9404fb9f", "sandboxStatus": "UPLOADED", "detectedExt": "PPTX", "isDeleted": false, "labeledCharset": "", "disposition": "attached", "sizeDecodedBytes": 2363187, "dataBase64": "U0NBTEFSKDB4N2YwNzEwNTNlNTEwKQ==\n", "isArchive": false, "urls": [], "textExtracted": "U0NBTEFSKDB4N2YwNzJjYmE4ZGUwKQ==\n", "isTimedOut": false, "metadata": {"edittime": 44, "author": "Ben Edwards", "shareddoc": 0, "revnumber": "4", "parcount": 0, "presentationtarget": "Widescreen", "codepage": 65001, "scalecrop": 0, "titlesofparts": "Arial;Calibri;Calibri Light;Office Theme;PowerPoint Presentation;PowerPoint Presentation", "lastauthor": "Ben Edwards", "title": "PowerPoint Presentation", "wordcount": 0, "slidecount": 2, "mmclips": 0, "hyperlinkschanged": 0, "appversion": "16.0000", "headingpairs": "Fonts Used;3;Theme;1;Slide Titles;2", "appname": "Microsoft Office PowerPoint", "linksdirty": 0, "hiddencount": 0, "notecount": 0}, "structureId": "0", "detectedName": "social posts eco refill.pptx", "isProtected": false, "labeledName": "social posts eco refill.pptx", "sha256": "95ce5a603f830fe8dfbba88865714e5407837d93e844010f7509a8e1cd04ce74", "labeledExt": "pptx", "detectedMime": "application/vnd.openxmlformats-officedocument.presentationml.presentation", "labeledMime": "application/vnd.openxmlformats-officedocument.presentationml.presentation"}], "ts": "2022-05-19T05:52:30.297121-0400"}

Badger · May 21, 2022, 2:39am

If you have a new issue you should generally open a new thread. However, since my entire answer is a link to this, no need to do so.

system · June 18, 2022, 2:39am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Accounting for "field name cannot be an empty string"? Logstash	2	2565	April 11, 2019
The JSON filter can extract empty field names, which are then unusable by Logstash Logstash	1	352	May 31, 2021
Handling a JSON key with a name being an empty string "" Logstash	1	777	November 10, 2022
Does the JSON filter create `null` value if parsing empty string? Logstash	3	2710	January 30, 2018
Json field check filter Logstash	3	2268	March 6, 2019

Handle json filter field name cannot be an empty string error

Related topics