Elasticsearch: Nested JSON object parsing and aggregation

Josh_Thatcher · August 3, 2022, 7:53pm

I am currently working on a project that takes GitLab Security Scanner artifacts, curls the JSON artifact files to ElasticSearch, and then visualizes certain information from said artifacts. In this scenario, I am trying to create a visualization that shows the number of different severity detections per scan.

For example, in this specific Semgrep scan, there are a total of 8 vulnerabilities detected: 2 Critical, and 6 Medium. (artifact example is at the bottom of the post).

The issue that I am running into is that after the JSON artifact has been indexed into ElasticSearch, it is not portraying the correct number of severities per level (Critical, High, Medium, etc). Kibana shows the scan as having 1 Critical, and 1 Medium, instead of 2 Critical and 6 Medium. If I check the data view, vulnerabilities.severity shows all of the detections (Critical, Critical, Medium, Medium, Medium, Medium, Medium, Medium).

I have tried dynamically mapping, explicitly mapping, using fielddata = true instead of multi-fields with text and keyword field types, and nothing seems to work.

Recently I have tried creating a runtime field in Stack Management / Data Views / Index with a script that creates an array based off of vulnerabilities.severity.keyword, creating a for-each loop that counts the number of specific severities, and then outputs the result as number of type 'long' for the new runtime field, but I have had no progress with this either. (script snippet is below, sorry formatting isn't keeping)

int total_crits = 0; String[] crits_array = /[ ]/.split(doc['vulnerabilities.severity'].value); for(String i : crits_array){
if(i == "critical"){
    total_crits = total_crits + 1;
}}emit(total_crits);

This runtime field ends up only outputting a value of '1', and the array that is formed also only has a length of '1'. I have tried creating a runtime field using the same script on a different field such as vulnerabilities.description.keyword and seeing if the array will populate correctly, and it portrays the correct number of '8', so I am really at a loss here. Any help would be much appreciated, really hitting a wall with this one.

Let me know if theres any additional info I can provide.

{
"version": "14.0.4",
"vulnerabilities": [
    {
        "id": "d4d5840a33d2a9ead8ad735282883ea8563e07b958f99e7465e57433f4d2e721",
        "category": "sast",
        "message": "Deserialization of Untrusted Data",
        "description": "Consider possible security implications associated with pickle module.\n",
        "cve": "",
        "severity": "Critical",
        "scanner": {
            "id": "semgrep",
            "name": "Semgrep"
        },
        "location": {
            "file": "examples/face_recognition_knn.py",
            "start_line": 38
        },
        "identifiers": [
            {
                "type": "semgrep_id",
                "name": "bandit.B403",
                "value": "bandit.B403",
                "url": "https://semgrep.dev/r/gitlab.bandit.B403"
            },
            {
                "type": "cwe",
                "name": "CWE-502",
                "value": "502",
                "url": "https://cwe.mitre.org/data/definitions/502.html"
            },
            {
                "type": "owasp",
                "name": "Insecure Deserialization",
                "value": "A8"
            },
            {
                "type": "bandit_test_id",
                "name": "Bandit Test ID B403",
                "value": "B403"
            }
        ]
    },
    {
        "id": "568d9fc5cea8ac63945e9e445ca23b52a5aa17fceee35a721b16c8cbbd37acde",
        "category": "sast",
        "message": "Deserialization of Untrusted Data",
        "description": "Consider possible security implications associated with pickle module.\n",
        "cve": "",
        "severity": "Critical",
        "scanner": {
            "id": "semgrep",
            "name": "Semgrep"
        },
        "location": {
            "file": "examples/facerec_ipcamera_knn.py",
            "start_line": 41
        },
        "identifiers": [
            {
                "type": "semgrep_id",
                "name": "bandit.B403",
                "value": "bandit.B403",
                "url": "https://semgrep.dev/r/gitlab.bandit.B403"
            },
            {
                "type": "cwe",
                "name": "CWE-502",
                "value": "502",
                "url": "https://cwe.mitre.org/data/definitions/502.html"
            },
            {
                "type": "owasp",
                "name": "Insecure Deserialization",
                "value": "A8"
            },
            {
                "type": "bandit_test_id",
                "name": "Bandit Test ID B403",
                "value": "B403"
            }
        ]
    },
    {
        "id": "e1505129ce291fe66bff5000acacc2b81da3aa79977d12a74f11e8f9d2b865ef",
        "category": "sast",
        "message": "Deserialization of Untrusted Data",
        "description": "Avoid using `pickle`, which is known to lead to code execution vulnerabilities.\nWhen unpickling, the serialized data could be manipulated to run arbitrary code.\nInstead, consider serializing the relevant data as JSON or a similar text-based\nserialization format.\n",
        "cve": "",
        "severity": "Medium",
        "scanner": {
            "id": "semgrep",
            "name": "Semgrep"
        },
        "location": {
            "file": "examples/face_recognition_knn.py",
            "start_line": 106
        },
        "identifiers": [
            {
                "type": "semgrep_id",
                "name": "bandit.B301-1",
                "value": "bandit.B301-1",
                "url": "https://semgrep.dev/r/gitlab.bandit.B301-1"
            },
            {
                "type": "cwe",
                "name": "CWE-502",
                "value": "502",
                "url": "https://cwe.mitre.org/data/definitions/502.html"
            },
            {
                "type": "owasp",
                "name": "Insecure Deserialization",
                "value": "A8"
            },
            {
                "type": "bandit_test_id",
                "name": "Bandit Test ID B301",
                "value": "B301"
            }
        ]
    },
    {
        "id": "ab46969c8e6f9134ae3b5cb8786bfe302cf6bd6938f96dd87b11dcfb701f6a0f",
        "category": "sast",
        "message": "Deserialization of Untrusted Data",
        "description": "Avoid using `pickle`, which is known to lead to code execution vulnerabilities.\nWhen unpickling, the serialized data could be manipulated to run arbitrary code.\nInstead, consider serializing the relevant data as JSON or a similar text-based\nserialization format.\n",
        "cve": "",
        "severity": "Medium",
        "scanner": {
            "id": "semgrep",
            "name": "Semgrep"
        },
        "location": {
            "file": "examples/face_recognition_knn.py",
            "start_line": 132
        },
        "identifiers": [
            {
                "type": "semgrep_id",
                "name": "bandit.B301-1",
                "value": "bandit.B301-1",
                "url": "https://semgrep.dev/r/gitlab.bandit.B301-1"
            },
            {
                "type": "cwe",
                "name": "CWE-502",
                "value": "502",
                "url": "https://cwe.mitre.org/data/definitions/502.html"
            },
            {
                "type": "owasp",
                "name": "Insecure Deserialization",
                "value": "A8"
            },
            {
                "type": "bandit_test_id",
                "name": "Bandit Test ID B301",
                "value": "B301"
            }
        ]
    },
    {
        "id": "6bbdabc3c4b6ba3b3e8a18ed8b6f7dbd231a1c535e3fa5493b337413fc406a36",
        "category": "sast",
        "message": "Deserialization of Untrusted Data",
        "description": "Avoid using `pickle`, which is known to lead to code execution vulnerabilities.\nWhen unpickling, the serialized data could be manipulated to run arbitrary code.\nInstead, consider serializing the relevant data as JSON or a similar text-based\nserialization format.\n",
        "cve": "",
        "severity": "Medium",
        "scanner": {
            "id": "semgrep",
            "name": "Semgrep"
        },
        "location": {
            "file": "examples/facerec_ipcamera_knn.py",
            "start_line": 111
        },
        "identifiers": [
            {
                "type": "semgrep_id",
                "name": "bandit.B301-1",
                "value": "bandit.B301-1",
                "url": "https://semgrep.dev/r/gitlab.bandit.B301-1"
            },
            {
                "type": "cwe",
                "name": "CWE-502",
                "value": "502",
                "url": "https://cwe.mitre.org/data/definitions/502.html"
            },
            {
                "type": "owasp",
                "name": "Insecure Deserialization",
                "value": "A8"
            },
            {
                "type": "bandit_test_id",
                "name": "Bandit Test ID B301",
                "value": "B301"
            }
        ]
    },
    {
        "id": "84afcb2af18d4b30a6de9e9d2e7a1844a9b8258c17c451f5f318e40c8665b39e",
        "category": "sast",
        "message": "Deserialization of Untrusted Data",
        "description": "Avoid using `pickle`, which is known to lead to code execution vulnerabilities.\nWhen unpickling, the serialized data could be manipulated to run arbitrary code.\nInstead, consider serializing the relevant data as JSON or a similar text-based\nserialization format.\n",
        "cve": "",
        "severity": "Medium",
        "scanner": {
            "id": "semgrep",
            "name": "Semgrep"
        },
        "location": {
            "file": "examples/facerec_ipcamera_knn.py",
            "start_line": 134
        },
        "identifiers": [
            {
                "type": "semgrep_id",
                "name": "bandit.B301-1",
                "value": "bandit.B301-1",
                "url": "https://semgrep.dev/r/gitlab.bandit.B301-1"
            },
            {
                "type": "cwe",
                "name": "CWE-502",
                "value": "502",
                "url": "https://cwe.mitre.org/data/definitions/502.html"
            },
            {
                "type": "owasp",
                "name": "Insecure Deserialization",
                "value": "A8"
            },
            {
                "type": "bandit_test_id",
                "name": "Bandit Test ID B301",
                "value": "B301"
            }
        ]
    },
    {
        "id": "6b5733765f40c7b998398e45575ac7b02d5788d219a55ae80da6e96d7d94cee3",
        "category": "sast",
        "message": "Active Debug Code",
        "description": "Detected Flask app with debug=True. Do not deploy to production with this flag enabled\nas it will leak sensitive information. Instead, consider using Flask configuration\nvariables or setting 'debug' using system environment variables.\n",
        "cve": "",
        "severity": "Medium",
        "scanner": {
            "id": "semgrep",
            "name": "Semgrep"
        },
        "location": {
            "file": "examples/web_service_example.py",
            "start_line": 113
        },
        "identifiers": [
            {
                "type": "semgrep_id",
                "name": "bandit.B201",
                "value": "bandit.B201",
                "url": "https://semgrep.dev/r/gitlab.bandit.B201"
            },
            {
                "type": "cwe",
                "name": "CWE-489",
                "value": "489",
                "url": "https://cwe.mitre.org/data/definitions/489.html"
            },
            {
                "type": "owasp",
                "name": "Security Misconfiguration",
                "value": "A6"
            },
            {
                "type": "bandit_test_id",
                "name": "Bandit Test ID B201",
                "value": "B201"
            }
        ]
    },
    {
        "id": "e404c37a5509d091001b989e4b951c2ab9ac1621c12e0636934e1fd128bee55f",
        "category": "sast",
        "message": "Active Debug Code",
        "description": "Detected Flask app with debug=True. Do not deploy to production with this flag enabled\nas it will leak sensitive information. Instead, consider using Flask configuration\nvariables or setting 'debug' using system environment variables.\n",
        "cve": "",
        "severity": "Medium",
        "scanner": {
            "id": "semgrep",
            "name": "Semgrep"
        },
        "location": {
            "file": "examples/web_service_example_Simplified_Chinese.py",
            "start_line": 110
        },
        "identifiers": [
            {
                "type": "semgrep_id",
                "name": "bandit.B201",
                "value": "bandit.B201",
                "url": "https://semgrep.dev/r/gitlab.bandit.B201"
            },
            {
                "type": "cwe",
                "name": "CWE-489",
                "value": "489",
                "url": "https://cwe.mitre.org/data/definitions/489.html"
            },
            {
                "type": "owasp",
                "name": "Security Misconfiguration",
                "value": "A6"
            },
            {
                "type": "bandit_test_id",
                "name": "Bandit Test ID B201",
                "value": "B201"
            }
        ]
    }
],
"scan": {
    "scanner": {
        "id": "semgrep",
        "name": "Semgrep",
        "url": "https://github.com/returntocorp/semgrep",
        "vendor": {
            "name": "GitLab"
        },
        "version": "0.76.2"
    },
    "type": "sast",
    "start_time": "2022-05-26T19:35:19",
    "end_time": "2022-05-26T19:35:27",
    "status": "success"
}

Christian_Dahlqvist · August 4, 2022, 6:12am

As far as I know Kibana still has very limited support for handling nested objects so I do not think what you want to do is possible with the document format you have. If you however instead stored each vulnerability as a separate document you should be able to visualise it.

system · September 1, 2022, 6:12am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Recording CVE data in Elastic and displaying with Kibana Kibana	8	1329	March 25, 2020
Elasticsearch/kibana mapping multi-layered/nested data Elasticsearch	2	410	June 18, 2021
Kibana Visualize for nested field Kibana visualisation	2	247	April 25, 2024
Search based on variables' count Kibana	4	2163	July 6, 2017
Payload: filters and counts result Kibana	5	1908	October 1, 2018

Elasticsearch: Nested JSON object parsing and aggregation

Related topics