Extracting data between two phrases with GROK

I'm trying to extract a powershell signature block from powershell scripts that are being ran in our environment. At the end of the powershell.file.script_block_text field there is sometimes a signature in this format:

# SIG # Begin signature block
# MIIbDQYJKoZIhvcNAQcCoIIa/jCCGvoCAQExCzAJBgUrDgMCGgUAMGkGCisGAQQB
# gjcCAQSgWzBZMDQGCisGAQQBgjcCAR4wJgIDAQAABBAfzDtgWUsITrck0sYpfvNR
# AgEAAgEAAgEAAgEAAgEAMCEwCQYFKw4DAhoFAAQUxKaXN7doWq+mq18IrzABoXMr
# 4l6gghXyMIIEoDCCA4igAwIBAgIKYRr16gAAAAAAajANBgkqhkiG9w0BAQUFADB5
# MQswCQYDVQQGEwJVUzETMBEGA1UECBMKV2FzaGluZ3RvbjEQMA4GA1UEBxMHUmVk
# bW9uZDEeMBwGA1UEChMVTWljcm9zb2Z0IENvcnBvcmF0aW9uMSMwIQYDVQQDExpN
# aWNyb3NvZnQgQ29kZSBTaWduaW5nIFBDQTAeFw0xMTExMDEyMjM5MTdaFw0xMzAy
# MDEyMjQ5MTdaMIGDMQswCQYDVQQGEwJVUzETMBEGA1UECBMKV2FzaGluZ3RvbjEQ
# MA4GA1UEBxMHUmVkbW9uZDEeMBwGA1UEChMVTWljcm9zb2Z0IENvcnBvcmF0aW9u
# SIG # End signature block"

However, I cannot quite figure out a GROK pattern to make this work. On the ingest pipeline I planned to extract everything between "Begin signature block" and "# SIG # End signature block", then remove all the hashtags # to leave the signature in a new field (powershell.file.script_block_signature).

I was thinking something like this but unable to get it to match:
(?<script_start>.* Begin signature block)(?<powershell.file.script_block_signature>[A-Za-z0-9]*)(?<script_end>.* End signature block)

I got this to match and I think it will work but I'm curious about the performance:
(?<script_start>.* Begin signature block)(?<powershell.file.script_block_signature>(.|\r|\n)*)(?<script_end>.*# SIG # End signature block)

On the last one I assume I'll just have to remove "\r\n# " with a processor

How about something like (only did a quick test, so may be buggy):


POST _ingest/pipeline/_simulate
{
  "docs": [
    {
      "_source" : {
        "data" : """
# SIG # Begin signature block
# MIIbDQYJKoZIhvcNAQcCoIIa/jCCGvoCAQExCzAJBgUrDgMCGgUAMGkGCisGAQQB
# gjcCAQSgWzBZMDQGCisGAQQBgjcCAR4wJgIDAQAABBAfzDtgWUsITrck0sYpfvNR
# AgEAAgEAAgEAAgEAAgEAMCEwCQYFKw4DAhoFAAQUxKaXN7doWq+mq18IrzABoXMr
# 4l6gghXyMIIEoDCCA4igAwIBAgIKYRr16gAAAAAAajANBgkqhkiG9w0BAQUFADB5
# MQswCQYDVQQGEwJVUzETMBEGA1UECBMKV2FzaGluZ3RvbjEQMA4GA1UEBxMHUmVk
# bW9uZDEeMBwGA1UEChMVTWljcm9zb2Z0IENvcnBvcmF0aW9uMSMwIQYDVQQDExpN
# aWNyb3NvZnQgQ29kZSBTaWduaW5nIFBDQTAeFw0xMTExMDEyMjM5MTdaFw0xMzAy
# MDEyMjQ5MTdaMIGDMQswCQYDVQQGEwJVUzETMBEGA1UECBMKV2FzaGluZ3RvbjEQ
# MA4GA1UEBxMHUmVkbW9uZDEeMBwGA1UEChMVTWljcm9zb2Z0IENvcnBvcmF0aW9u
# SIG # End signature block"
        """
      }
    }],
    "pipeline": {
      "processors": [
        {
          "grok": {
            "field": "data",
            "patterns": ["# SIG # Begin signature block\n(?m)%{GREEDYDATA:message}\n# SIG # End signature block"]
          }
        }
      ]
    }
}
1 Like

Using the Grok Debugger in Kibana, I can't get it to match

If you need to extract data from two strings, I think that the dissect processor works best, and the pattern is pretty similar to the one that @spinscale shared.

It would be something like this:

POST _ingest/pipeline/_simulate
{
  "docs": [
    {
      "_source" : {
        "data" : """
# SIG # Begin signature block
# MIIbDQYJKoZIhvcNAQcCoIIa/jCCGvoCAQExCzAJBgUrDgMCGgUAMGkGCisGAQQB
# gjcCAQSgWzBZMDQGCisGAQQBgjcCAR4wJgIDAQAABBAfzDtgWUsITrck0sYpfvNR
# AgEAAgEAAgEAAgEAAgEAMCEwCQYFKw4DAhoFAAQUxKaXN7doWq+mq18IrzABoXMr
# 4l6gghXyMIIEoDCCA4igAwIBAgIKYRr16gAAAAAAajANBgkqhkiG9w0BAQUFADB5
# MQswCQYDVQQGEwJVUzETMBEGA1UECBMKV2FzaGluZ3RvbjEQMA4GA1UEBxMHUmVk
# bW9uZDEeMBwGA1UEChMVTWljcm9zb2Z0IENvcnBvcmF0aW9uMSMwIQYDVQQDExpN
# aWNyb3NvZnQgQ29kZSBTaWduaW5nIFBDQTAeFw0xMTExMDEyMjM5MTdaFw0xMzAy
# MDEyMjQ5MTdaMIGDMQswCQYDVQQGEwJVUzETMBEGA1UECBMKV2FzaGluZ3RvbjEQ
# MA4GA1UEBxMHUmVkbW9uZDEeMBwGA1UEChMVTWljcm9zb2Z0IENvcnBvcmF0aW9u
# SIG # End signature block"
        """
      }
    }],
    "pipeline": {
      "processors": [
        {
        "dissect": {
            "field": "data",
            "pattern": "# SIG # Begin signature block\n%{message}\n# SIG # End signature block"
          }
        }
      ]
    }
}
1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.