I'm trying to extract a powershell signature block from powershell scripts that are being ran in our environment. At the end of the powershell.file.script_block_text field there is sometimes a signature in this format:
# SIG # Begin signature block
# MIIbDQYJKoZIhvcNAQcCoIIa/jCCGvoCAQExCzAJBgUrDgMCGgUAMGkGCisGAQQB
# gjcCAQSgWzBZMDQGCisGAQQBgjcCAR4wJgIDAQAABBAfzDtgWUsITrck0sYpfvNR
# AgEAAgEAAgEAAgEAAgEAMCEwCQYFKw4DAhoFAAQUxKaXN7doWq+mq18IrzABoXMr
# 4l6gghXyMIIEoDCCA4igAwIBAgIKYRr16gAAAAAAajANBgkqhkiG9w0BAQUFADB5
# MQswCQYDVQQGEwJVUzETMBEGA1UECBMKV2FzaGluZ3RvbjEQMA4GA1UEBxMHUmVk
# bW9uZDEeMBwGA1UEChMVTWljcm9zb2Z0IENvcnBvcmF0aW9uMSMwIQYDVQQDExpN
# aWNyb3NvZnQgQ29kZSBTaWduaW5nIFBDQTAeFw0xMTExMDEyMjM5MTdaFw0xMzAy
# MDEyMjQ5MTdaMIGDMQswCQYDVQQGEwJVUzETMBEGA1UECBMKV2FzaGluZ3RvbjEQ
# MA4GA1UEBxMHUmVkbW9uZDEeMBwGA1UEChMVTWljcm9zb2Z0IENvcnBvcmF0aW9u
# SIG # End signature block"
However, I cannot quite figure out a GROK pattern to make this work. On the ingest pipeline I planned to extract everything between "Begin signature block" and "# SIG # End signature block", then remove all the hashtags # to leave the signature in a new field (powershell.file.script_block_signature).
I was thinking something like this but unable to get it to match: (?<script_start>.* Begin signature block)(?<powershell.file.script_block_signature>[A-Za-z0-9]*)(?<script_end>.* End signature block)
I got this to match and I think it will work but I'm curious about the performance: (?<script_start>.* Begin signature block)(?<powershell.file.script_block_signature>(.|\r|\n)*)(?<script_end>.*# SIG # End signature block)
On the last one I assume I'll just have to remove "\r\n# " with a processor
If you need to extract data from two strings, I think that the dissect processor works best, and the pattern is pretty similar to the one that @spinscale shared.
It would be something like this:
POST _ingest/pipeline/_simulate
{
"docs": [
{
"_source" : {
"data" : """
# SIG # Begin signature block
# MIIbDQYJKoZIhvcNAQcCoIIa/jCCGvoCAQExCzAJBgUrDgMCGgUAMGkGCisGAQQB
# gjcCAQSgWzBZMDQGCisGAQQBgjcCAR4wJgIDAQAABBAfzDtgWUsITrck0sYpfvNR
# AgEAAgEAAgEAAgEAAgEAMCEwCQYFKw4DAhoFAAQUxKaXN7doWq+mq18IrzABoXMr
# 4l6gghXyMIIEoDCCA4igAwIBAgIKYRr16gAAAAAAajANBgkqhkiG9w0BAQUFADB5
# MQswCQYDVQQGEwJVUzETMBEGA1UECBMKV2FzaGluZ3RvbjEQMA4GA1UEBxMHUmVk
# bW9uZDEeMBwGA1UEChMVTWljcm9zb2Z0IENvcnBvcmF0aW9uMSMwIQYDVQQDExpN
# aWNyb3NvZnQgQ29kZSBTaWduaW5nIFBDQTAeFw0xMTExMDEyMjM5MTdaFw0xMzAy
# MDEyMjQ5MTdaMIGDMQswCQYDVQQGEwJVUzETMBEGA1UECBMKV2FzaGluZ3RvbjEQ
# MA4GA1UEBxMHUmVkbW9uZDEeMBwGA1UEChMVTWljcm9zb2Z0IENvcnBvcmF0aW9u
# SIG # End signature block"
"""
}
}],
"pipeline": {
"processors": [
{
"dissect": {
"field": "data",
"pattern": "# SIG # Begin signature block\n%{message}\n# SIG # End signature block"
}
}
]
}
}
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.