Extract Fields from JSON Array


#1

Hello,

I would like to extract the fields from the following JSON:

{
	"startTime": "2018-10-01T20:33:56",
	"results": [{
		"flags": [
			"ScanPe::no_resources",
			"ScanYara::compiling_error"
		],
		"flavors": {
			"mime": [
				"application/x-dosexec"
			],
			"yara": [
				"mz_file"
			]
		},
		"entropyMetadata": {
			"entropy": 6.030109054353968
		},
		"exiftoolMetadata": {
			"exiftool": [{
					"field": "SourceFile",
					"value": "/dev/shm/tmpltn6xut0"
				},
				{
					"field": "ExifToolVersion",
					"value": 10.8
				}
			]
		}
	}]
}

I am especially having issues with[results][exiftoolMetadata][exiftool] , as I receive errors in the Logstash log saying this event can't be indexed:

Ex.
"error"=>{"type"=>"illegal_argument_exception" , "reason"=>"mapper [results.exiftoolMetadata.exiftool.value] of different type, current_type [float], merged_type [keyword]"}}}}

I am looking to get the fields/values for [results][exiftoolMetadata][exiftool] like the following:

Ex.

exiftoolMetadata.exiftool.SourceFile
exiftoolMetadata.exiftool.ExifToolVersion

or maybe even

exiftoolMetadata.SourceFile
exiftoolMetadata.ExifToolVersion

(and the corresponding values)

I've tried a few times with Ruby, but I either get a syntax error, or I am not completely understanding what is going on, I think.

Any suggestions would be appreciated!


(Magnus Bäck) #2

The problem is that [exiftoolMetadata][exiftool][value] is sometimes a string and sometimes a number. A particular Elasticsearch field must have a single type.

Your instinct to turn those key/value pairs into discrete fields is correct. Ruby examples of doing that have been posted in the past. Perhaps Split arrays of keys and values is helpful?


#3

Thanks, Magnus. Unfortunately, I seem to get the same error from Logstash when I do something like:

    json {
        source => "message"
    }

    ruby {
        code => "event.get('[results][exiftoolMetadata][exiftool]').each {|hash| event.set('[results][exiftoolMetadata][exiftool][' + hash['field'] + ']', hash['value']) }"
    }

Any thoughts why this might still be happening?

Thanks


(Magnus Bäck) #4

Are you deleting the original fields? If not I don't see how you can possibly get the exact same error.


#5

No, I am just using the above, plus a mutate to remove a tag and add a tag. That's it.

I'll add that I am using Filebeat to pick up the log (with no special config) and that the example log above is a snip of the following (not sure if anything else would cause the error):

{
	"startTime": "2018-10-03T19:58:30",
	"finishTime": "2018-10-03T19:58:30",
	"elapsedTime": 0.265012,
	"server": "d388b3066069",
	"worker": "3FD39-51BFA",
	"results": [{
		"flags": ["ScanPe::no_resources", "ScanYara::compiling_error"],
		"flavors": {
			"mime": ["application/x-dosexec"],
			"yara": ["mz_file"]
		},
		"entropyMetadata": {
			"entropy": 6.030109054353968
		},
		"exiftoolMetadata": {
			"exiftool": [{
				"field": "SourceFile",
				"value": "/dev/shm/tmpxj5sx192"
			}, {
				"field": "ExifToolVersion",
				"value": 10.8
			}, {
				"field": "FileName",
				"value": "tmpxj5sx192"
			}, {
				"field": "Directory",
				"value": "/dev/shm"
			}, {
				"field": "FileSize",
				"value": 8192
			}, {
				"field": "FileModifyDate",
				"value": "2018:10:03 19:58:30+00:00"
			}, {
				"field": "FileAccessDate",
				"value": "2018:10:03 19:58:30+00:00"
			}, {
				"field": "FileInodeChangeDate",
				"value": "2018:10:03 19:58:30+00:00"
			}, {
				"field": "FilePermissions",
				"value": 600
			}, {
				"field": "FileType",
				"value": "Win32 EXE"
			}, {
				"field": "FileTypeExtension",
				"value": "EXE"
			}, {
				"field": "MIMEType",
				"value": "application/octet-stream"
			}, {
				"field": "MachineType",
				"value": 332
			}, {
				"field": "TimeStamp",
				"value": "2008:01:06 14:51:31+00:00"
			}, {
				"field": "PEType",
				"value": 267
			}, {
				"field": "LinkerVersion",
				"value": 5.12
			}, {
				"field": "CodeSize",
				"value": 512
			}, {
				"field": "InitializedDataSize",
				"value": 7168
			}, {
				"field": "UninitializedDataSize",
				"value": 0
			}, {
				"field": "EntryPoint",
				"value": 520
			}, {
				"field": "OSVersion",
				"value": 4.0
			}, {
				"field": "ImageVersion",
				"value": 4.0
			}, {
				"field": "SubsystemVersion",
				"value": 4.0
			}, {
				"field": "Subsystem",
				"value": 2
			}]
		},
		"hashMetadata": {
			"md5": "e2c33fa7a3802289d46a7c3e4e1df342",
			"sha1": "d8fd563fbbdea43c78841ccca49e8c5a3fe47cbc",
			"sha256": "35c35bc56ce3064f6236db4432fdcf578d098353076d3fbe1e600fa926bc6227",
			"ssdeep": "192:JJGc1Zl2+VAfNxl1THs6xgzgVGjPlROInQAlKhFo2A:JJGcMJxDTHfRmoc"
		},
		"headerMetadata": {
			"header": "MZ\ufffd\u0000\u0003\u0000\u0000\u0000\u0004\u0000\u0000\u0000\ufffd\ufffd\u0000\u0000\ufffd\u0000\u0000\u0000\u0000\u0000\u0000\u0000@\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"
		},
		"peMetadata": {
			"total": {
				"sections": 2
			},
			"timestamp": "2008-01-06T14:51:31",
			"machine": {
				"id": 332,
				"type": "IMAGE_FILE_MACHINE_I386"
			},
			"imageMagic": "32_BIT",
			"subsystem": "IMAGE_SUBSYSTEM_WINDOWS_GUI",
			"stackReserveSize": 1048576,
			"stackCommitSize": 4096,
			"heapReserveSize": 1048576,
			"heapCommitSize": 4096,
			"entryPoint": 520,
			"imageBase": 4194304,
			"imageCharacteristics": ["IMAGE_FILE_RELOCS_STRIPPED", "IMAGE_FILE_EXECUTABLE_IMAGE", "IMAGE_FILE_LINE_NUMS_STRIPPED", "IMAGE_FILE_LOCAL_SYMS_STRIPPED", "IMAGE_FILE_32BIT_MACHINE"],
			"imphash": "f9ade0aa18f660a34a4fa23392e21838",
			"imports": ["kernel32.dll"],
			"importFunctions": [{
				"import": "kernel32.dll",
				"functions": ["ExitProcess"]
			}],
			"sections": [{
				"name": ".text",
				"flags": ["IMAGE_SCN_CNT_CODE", "IMAGE_SCN_MEM_EXECUTE", "IMAGE_SCN_MEM_READ"],
				"structure": "IMAGE_SECTION_HEADER"
			}, {
				"name": ".data",
				"flags": ["IMAGE_SCN_CNT_INITIALIZED_DATA", "IMAGE_SCN_MEM_READ", "IMAGE_SCN_MEM_WRITE"],
				"structure": "IMAGE_SECTION_HEADER"
			}]
		},
		"selfMetadata": {
			"filename": "HTTP-FjotbH3FKOthlP3Tck.exe",
			"depth": 0,
			"uid": "df077284-ea41-4a7f-ae94-fc1281a23384",
			"rootUid": "df077284-ea41-4a7f-ae94-fc1281a23384",
			"hash": "35c35bc56ce3064f6236db4432fdcf578d098353076d3fbe1e600fa926bc6227",
			"rootHash": "35c35bc56ce3064f6236db4432fdcf578d098353076d3fbe1e600fa926bc6227",
			"source": "d388b3066069",
			"scannerList": ["ScanEntropy", "ScanExiftool", "ScanHash", "ScanHeader", "ScanPe", "ScanSelf", "ScanYara"],
			"size": 8192
		}
	}]
}

(Magnus Bäck) #6

No, I am just using the above, plus a mutate to remove a tag and add a tag. That's it.

Okay, but then you're not addressing the fundamental problem, namely that a given field is sometimes a string and sometimes a number.


#7

I'm sorry, I was under the assumption that by using the above Ruby code, it would split the fields out into distinct fields with their own values, therefore, they would not conflict. Is this a false assumption?

Ex.

exiftoolMetadata.exiftool.SourceFile
exiftoolMetadata.exiftool.ExifToolVersion


(Magnus Bäck) #8

You are creating those new fields but the old ones remain and need to be explicitly removed.


#9

Okay, so I added the following:

mutate {
      remove_field => [ "[results][exiftoolMetadata][exiftool][value]"]
}

...but I also just noticed that I am getting the following in the Logstash log:

Ruby exception occurred: undefined method 'each' for nil:NilClass

and

"error"=>{"type"=>"illegal_argument_exception", "reason"=>"mapper [results.exiftoolMetadata.exiftool.value] of different type, current_type [float], merged_type [text]"}}}}

I'm assuming that it's not finding anything (or doesn't know how to find it at the state the Ruby code is run) under [results][exiftoolMetadata][exiftool]

To recap, here is the entire config:

filter {
      if "mylog" in [tags] {

        json {
            source => "message"
        }
       
        ruby {
            code => "event.get('[results][exiftoolMetadata][exiftool]').each {|hash| event.set('[results][exiftoolMetadata][exiftool][' + hash['field'] + ']', hash['value']) }"
        }

        mutate {   
            remove_field => [ "[results][exiftoolMetadata][exiftool][value]"]
            add_tag => [ "processed" ]
        }
      }
    }

I appreciate your continued help this -- any other ideas?

Thanks


(Magnus Bäck) #10

Ruby exception occurred: undefined method 'each' for nil:NilClass

This indicates that the event didn't have a [results][exiftoolMetadata][exiftool] field. Wait, results is an array so the field reference must be [results][0][exiftoolMetadata][exiftool].


#11

Thanks, Magnus! I was able to get this working by using the index in the array as you mentioned.


(system) #12

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.