Handle json filter field name cannot be an empty string error

I have some Proofpoint logs in JSON format likes...

{
"metadata":{
	"tzOffsetMins":-240,
	"tsEpochMs":1650294078380
},
"ts":"2022-05-17T01:02:10.380410-0400",
"guid":"T1sZJIJ-YGTdBDzraMXIw6rea9pB78Qw",
"msgParts":[
	{
	"structureId":"0",
	"metadata":{
		"title":"GRAINGER",
		"viewport":"width=device-width, initial-scale=1.0 ",
		"":"Grainger"
	},
	"detectedName":"text.html",
	"disposition":"inline"
	}
],
"action_dmarc":[
]
}

since there is a key in [msgParts][metadata] that does not have a name, the log won't be able to index to Elasticsearch due to "field name cannot be an empty string" error.

I wanted to update the key with a new name and tried following the solution mentioned in the thread but it looks like I messed up some configs and didn't get the expected results.

Logstash config is like...

filter {
  ruby {
    code  => '
		h = event.get("[msgParts][metadata]")
		newh = {}
		h.each { |k, v|
			if k == ""
				newk = "extensionless"
			else
				newk = k
			end
			newh[newk] = v
		}
		event.set("[msgParts][metadata]", newh)
	'
  }
  json {
    source => "message"
  }
}

Also tried source => "event" in json filter but get _rubyexception, _jsonparsefailure tags.

Could anyone please take a look and advise? Logstash version is 8.1.3.

Two problems. Firstly, the json filter has to come before the ruby filter, otherwise the [msgParts] field does not exist when the ruby filter processes the event. Secondly, [msgParts] is an array so both references should be

"[msgParts][0][metadata]"

Once you fix that it works

    "msgParts" => [
    [0] {
        "detectedName" => "text.html",
            "metadata" => {
            "extensionless" => "Grainger",
                 "viewport" => "width=device-width, initial-scale=1.0 ",

BTW, I now think it is less ugly to use

            newk = k == "" ? "extensionless" : k
            newh[newk] = v

It works perfectly. Thank you, Badger!!! But I just found out [metadata] is not only present in the first item of msgParts. So I also need to loop the entire array and test the existence of the [metadata] field and then do your magic. Could you advise? I have zero experience with ruby and below is where I can go at the moment. Thank you again!!

  ruby {
    code  => '
		m = event.get("[msgParts]")
		m.each_with_index  {
			if [metadata] {
				h = event.get("[metadata]")
				newh = {}
				h.each { |k, v|
					newk = k == "" ? "extensionless" : k
					newh[newk] = v
				}
				event.set("[metadata]", newh)
			}
		}
	'
  }

If your data structure is

{
"guid":"T1sZJIJ-YGTdBDzraMXIw6rea9pB78Qw",
"msgParts":[
    { "structureId":"0", "metadata":{ "title":"Foo", "":"Foo" } },
    { "structureId":"1", "metadata":{ "title":"Bar", "":"Bar" } }
],
"action_dmarc":[ ]
}

and you want to iterate over the array then I would rewrite the filter as

  ruby {
    code  => '
        a = event.get("[msgParts]")
        a.each_index { |x|
            a[x]["metadata"]["extensionless"] = a[x]["metadata"].delete("")
        }
        event.set("[msgParts]", a)
    '
  }

It works like a charm. Thank you Badger!!!

Here is the final config for parsing Proofpoint historical log data extraction, in case anyone else needs it.

filter {
  json {
    source => "message"
  }
  ruby {
    code  => '
        a = event.get("[msgParts]")
        a.each_index { |x|
            a[x]["metadata"]["extensionless"] = a[x]["metadata"].delete("")
        }
        event.set("[msgParts]", a)
    '
  }
  date {
   match => ["ts","YYYY-MM-dd'T'HH:mm:ss.SSSZ","ISO8601"]
   target => "@timestamp"
  }
}

The config works, but I see a lot errors in logstash log. Please advise.

[2022-05-19T22:32:21,992][ERROR][logstash.filters.ruby ][filebeat-pf][8fca63a10259496bbfe146da14e553c2fb50c57a2ccb359db4a8a23d22d17a12] Ruby exception occurred: undefined method each_index' for nil:NilClass {:class=>"NoMethodError", :backtrace=>["(ruby filter code):4:in block in filter_method'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-filter-ruby-3.1.8/lib/logstash/filters/ruby.rb:96:in inline_script'", "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-filter-ruby-3.1.8/lib/logstash/filters/ruby.rb:89:in filter'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:159:in do_filter'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:178:in block in multi_filter'", "org/jruby/RubyArray.java:1821:in each'", "/usr/share/logstash/logstash-core/lib/logstash/filters/base.rb:175:in multi_filter'", "org/logstash/config/ir/compiler/AbstractFilterDelegatorExt.java:134:in multi_filter'", "/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:299:in block in start_workers'"]}

Some events do not have a [msgParts] field. Use a conditional in the ruby filter

    a = event.get("[msgParts]")
    if a
        a.each_index { |x|
            a[x]["metadata"]["extensionless"] = a[x]["metadata"].delete("")
        }
        event.set("[msgParts]", a)
    end

It works. Thank you bro.

Have a new issue, bro. The name of some fields contains special characters like,

 "metadata": {"codecname": "MPEG-4 Audio", "xap/1.0/mm/xmpmm:pantry[2]/xmptpg:npages": "1", "xap/1.0/mm/xmpmm:history[9]/stevt:changed": "/", "createdate": "2022-04-27T14:56:41+01:00", "xap/1.0/mm/xmpmm:pantry[52]/tiff:make": "Canon", "xap/1.0/mm/xmpmm:pantry[4]/xmptpg:npages": "1", "codec": "mp4a", "rating": "0", "xap/1.0/mm/xmpmm:history[6]/stevt:changed": "/", "framewidth (pixels)": 1920, "datarate (kbps)": 1495, "thumbnails[1]/xmpgimg:width": "256", "dc/elements/1.1/dc:format": "H.264", "filetype": "mp42", "thumbnails[1]/xmpgimg:format": "JPEG", "minutes": 0, "xap/1.0/mm/xmpmm:pantry[1]/xmptpg:npages": "1", "audiosamplerate (khz)": 48, "audiobitrate (kbps)": 317, "filetypedeveloper": "ISO", "streamtype": "audio", "encrypted": 0, "hours": 0, "modifydate": "2022-04-27T14:57:12+01:00", "metadatadate": "2022-04-27T14:57:12+01:00", "thumbnails[1]/xmpgimg:height": "72", "xap/1.0/mm/xmpmm:pantry[3]/xmptpg:npages": "1", "length (hh:mm:ss)": "00:00:44", "xap/1.0/mm/xmpmm:pantry[28]/dc:title[1]": "5L ", "channels": 2, "frameheight (pixels)": 1080, "framerate (frames/sec)": 30, "seconds": 44, "tiff/1.0/tiff:orientation": "1", "streamid": "2", "xap/1.0/mm/xmpmm:history[1]/stevt:changed": "/", "xap/1.0/mm/xmpmm:pantry[52]/aux:lensid": "198"}

This causes <RuntimeError: Invalid FieldReference: xap/1.0/mm/xmpmm:pantry[2]/xmptpg:npages> error.

How should I handle this field name? I also tried to leave it as a _jsonparsefailure and keep the original message, but my current config failed to parse the ts field. Could you advise?

filter {
	json {
		source => "message"
	}
	ruby {
		code  => '
			a = event.get("[msgParts]")
			if a
				a.each_index { |x|
					if a[x]["metadata"]
						a[x]["metadata"]["extensionless"] = a[x]["metadata"].delete("")
					end
				}
				event.set("[msgParts]", a)
			end
		'
	}
	date {
		match => ["ts","YYYY-MM-dd'T'HH:mm:ss.SSSZ","ISO8601"]
		target => "@timestamp"
	}
	if "_jsonparsefailure" in [tags] {
		mutate {
			add_field => { "ts1" => "" }
        }
		ruby {
			code => '
				t = event.get("[ts]")
				event.set("[ts1]", t)
			'
		}
		date {
			match => ["ts","YYYY-MM-dd'T'HH:mm:ss.SSSZ","ISO8601"]
			target => "@timestamp"
		}
	
	}
	if  "_jsonparsefailure" not in [tags] {
		mutate { remove_field => ["[event][original]","[message]"] }
	}
}

Below is a sample log.

{"guid": "6-WyL_R332XqoIateyxm41pivkPL8ohQ", "msgParts": [{"disposition": "attached", "sizeDecodedBytes": 10184246, "isDeleted": false, "labeledCharset": "", "sandboxStatus": "NOT_SUPPORTED", "detectedExt": "MP4", "md5": "21551aaa91f496f9757b774d07c68af9", "detectedSizeBytes": 10184246, "isVirtual": false, "isCorrupted": false, "detectedCharset": "", "labeledExt": "mp4", "sha256": "7a6fd506e9fec42c7c0ca7b0a4067e64b0858e25bea88f13bb889e3a4c65ee50", "detectedMime": "video/mp4", "labeledMime": "video/mp4", "structureId": "0", "detectedName": "eco refill movie_1.mp4", "isProtected": false, "labeledName": "eco refill movie_1.mp4", "isTimedOut": false, "metadata": {"codecname": "MPEG-4 Audio", "xap/1.0/mm/xmpmm:pantry[2]/xmptpg:npages": "1", "xap/1.0/mm/xmpmm:history[9]/stevt:changed": "/", "createdate": "2022-04-27T14:56:41+01:00", "xap/1.0/mm/xmpmm:pantry[52]/tiff:make": "Canon", "xap/1.0/mm/xmpmm:pantry[4]/xmptpg:npages": "1", "codec": "mp4a", "rating": "0", "xap/1.0/mm/xmpmm:history[6]/stevt:changed": "/", "framewidth (pixels)": 1920, "datarate (kbps)": 1495, "thumbnails[1]/xmpgimg:width": "256", "dc/elements/1.1/dc:format": "H.264", "filetype": "mp42", "thumbnails[1]/xmpgimg:format": "JPEG", "minutes": 0, "xap/1.0/mm/xmpmm:pantry[1]/xmptpg:npages": "1", "audiosamplerate (khz)": 48, "audiobitrate (kbps)": 317, "filetypedeveloper": "ISO", "streamtype": "audio", "encrypted": 0, "hours": 0, "modifydate": "2022-04-27T14:57:12+01:00", "metadatadate": "2022-04-27T14:57:12+01:00", "thumbnails[1]/xmpgimg:height": "72", "xap/1.0/mm/xmpmm:pantry[3]/xmptpg:npages": "1", "length (hh:mm:ss)": "00:00:44", "xap/1.0/mm/xmpmm:pantry[28]/dc:title[1]": "5L ", "channels": 2, "frameheight (pixels)": 1080, "framerate (frames/sec)": 30, "seconds": 44, "tiff/1.0/tiff:orientation": "1", "streamid": "2", "xap/1.0/mm/xmpmm:history[1]/stevt:changed": "/", "xap/1.0/mm/xmpmm:pantry[52]/aux:lensid": "198"}, "textExtracted": "U0NBTEFSKDB4N2YwNzEwMmZhOTMwKQ==\n", "urls": [], "dataBase64": "U0NBTEFSKDB4N2YwN2E2YjM0ZWEwKQ==\n", "isArchive": false}, {"sandboxStatus": "NOT_SUPPORTED", "detectedExt": "JPG", "md5": "32825d23f13ae618fe0802801690274b", "disposition": "attached", "sizeDecodedBytes": 1166579, "isDeleted": false, "labeledCharset": "", "isCorrupted": false, "isVirtual": false, "detectedCharset": "", "detectedSizeBytes": 1166579, "detectedName": "eco refill 3840x1098px-02.jpg", "structureId": "0", "labeledName": "eco refill 3840x1098px-02.jpg", "isProtected": false, "isTimedOut": false, "metadata": {}, "sha256": "cc5c09c4ea143d2a0f03d20ff0edc59f8a3e34f56594105b25e6b124c5b2d7a1", "labeledExt": "jpg", "detectedMime": "image/jpeg", "labeledMime": "image/jpeg", "urls": [], "dataBase64": "U0NBTEFSKDB4N2YwNzEwNTQ1MzAwKQ==\n", "isArchive": false, "textExtracted": "U0NBTEFSKDB4N2YwNzc2NmM4MWY4KQ==\n"}, {"detectedCharset": "", "isVirtual": false, "isCorrupted": false, "detectedSizeBytes": 2363187, "md5": "6d19a5dee9ca5b383f488fbe9404fb9f", "sandboxStatus": "UPLOADED", "detectedExt": "PPTX", "isDeleted": false, "labeledCharset": "", "disposition": "attached", "sizeDecodedBytes": 2363187, "dataBase64": "U0NBTEFSKDB4N2YwNzEwNTNlNTEwKQ==\n", "isArchive": false, "urls": [], "textExtracted": "U0NBTEFSKDB4N2YwNzJjYmE4ZGUwKQ==\n", "isTimedOut": false, "metadata": {"edittime": 44, "author": "Ben Edwards", "shareddoc": 0, "revnumber": "4", "parcount": 0, "presentationtarget": "Widescreen", "codepage": 65001, "scalecrop": 0, "titlesofparts": "Arial;Calibri;Calibri Light;Office Theme;PowerPoint Presentation;PowerPoint Presentation", "lastauthor": "Ben Edwards", "title": "PowerPoint Presentation", "wordcount": 0, "slidecount": 2, "mmclips": 0, "hyperlinkschanged": 0, "appversion": "16.0000", "headingpairs": "Fonts Used;3;Theme;1;Slide Titles;2", "appname": "Microsoft Office PowerPoint", "linksdirty": 0, "hiddencount": 0, "notecount": 0}, "structureId": "0", "detectedName": "social posts eco refill.pptx", "isProtected": false, "labeledName": "social posts eco refill.pptx", "sha256": "95ce5a603f830fe8dfbba88865714e5407837d93e844010f7509a8e1cd04ce74", "labeledExt": "pptx", "detectedMime": "application/vnd.openxmlformats-officedocument.presentationml.presentation", "labeledMime": "application/vnd.openxmlformats-officedocument.presentationml.presentation"}], "ts": "2022-05-19T05:52:30.297121-0400"}

If you have a new issue you should generally open a new thread. However, since my entire answer is a link to this, no need to do so.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.