In the future, if you provide a better more representative sample, we can help quicker instead of iterating. Just something to think about in the future.
In fact at this point I am unclear what your data looks like your samples have short ID's and you are explaining it is long... the closer you can provide real data the better... so I am still guessing.
Also if you are going to a lot of this I would read / learn a bit more about grok and dissect from here tl;dr grok
more flexible
and dissect
is more efficient / faster you could probably use either I showed both
So test data with long ids etc...
#REQ:123-44aa-4fe1-b88a-123aa#REQ:1#VARNAME:VarValue#PROC:1234-aaaa-1234-aba1234
REQ 12222-aa 1 2022-05-30 18:38:49.609 CompanyA
RES 12222-aa-b1-12-alsdkjfh-salkdfjhas-kalaskdjfh <?xml version="1.0" encoding="UTF-8" standalone="yes"?><data><name>Belgian Waffles</name><price>$5.95</price></data>
REQ 12223-aa 1 2022-05-30 18:38:49.609 CompanyB
RES 12223-aa-b1-12-sakdjfhsaldkjf-lsakdjfhsaldfkjh <?xml version="1.0" encoding="UTF-8" standalone="yes"?><data><name>French Toast</name><price>$4.50</price></data>
REQ 12224-aa 1 2022-05-30 18:38:49.609 CompanyC
RES 12224-aa-b1-12-lasdkjfhsadlfkjhsadflkj <?xml version="1.0" encoding="UTF-8" standalone="yes"?><data><name>Homestyle Breakfast</name><price>$6.95</price></data>
Pipeline I gave you both grok and dissect ... you can figure it our from here... Take a look at the docs they will help.
input {
file {
path => "/Users/sbrown/workspace/sample-data/discuss/mixed-text-xml/mixed-txt-xml.txt"
start_position => "beginning"
sincedb_path => "/dev/null"
type => "xml"
}
}
filter {
# For Grok use \t for tabs
# grok {
# match => { "message" => "%{WORD:request_type}\t%{DATA:request_id}\t%{GREEDYDATA:msg_details}"}
# }
# dissect you need to paste in actual tabs
dissect {
mapping => { "message" => "%{request_type} %{request_id} %{msg_details}"}
}
# you could put some if logic around this if you want to on parse the xml if the grok or dissect is succesfull
xml {
source => "msg_details"
store_xml => true
target => "xml_data"
force_array => false
}
# For Grok
# if "_grokparsefailure" in [tags] or "_xmlparsefailure" in [tags] {
# drop {}
# }
# For Dissect
if "_dissectfailure" in [tags] or "_xmlparsefailure" in [tags] {
drop {}
}
}
output {
stdout {codec => "rubydebug"}
}
results with long ids...
{
"@version" => "1",
"xml_data" => {
"price" => "$6.95",
"name" => "Homestyle Breakfast"
},
"@timestamp" => 2022-05-30T17:08:41.438868Z,
"request_type" => "RES",
"message" => "RES\t12224-aa-b1-12-lasdkjfhsadlfkjhsadflkj\t<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><data><name>Homestyle Breakfast</name><price>$6.95</price></data>",
"request_id" => "12224-aa-b1-12-lasdkjfhsadlfkjhsadflkj",
"type" => "xml",
"host" => {
"name" => "hyperion.local"
},
"event" => {
"original" => "RES\t12224-aa-b1-12-lasdkjfhsadlfkjhsadflkj\t<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><data><name>Homestyle Breakfast</name><price>$6.95</price></data>"
},
"log" => {
"file" => {
"path" => "/Users/sbrown/workspace/sample-data/discuss/mixed-text-xml/mixed-txt-xml.txt"
}
},
"msg_details" => "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><data><name>Homestyle Breakfast</name><price>$6.95</price></data>"
}
{
"@version" => "1",
"xml_data" => {
"price" => "$5.95",
"name" => "Belgian Waffles"
},
"@timestamp" => 2022-05-30T17:08:41.438348Z,
"request_type" => "RES",
"message" => "RES\t12222-aa-b1-12-alsdkjfh-salkdfjhas-kalaskdjfh\t<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><data><name>Belgian Waffles</name><price>$5.95</price></data>",
"request_id" => "12222-aa-b1-12-alsdkjfh-salkdfjhas-kalaskdjfh",
"type" => "xml",
"host" => {
"name" => "hyperion.local"
},
"event" => {
"original" => "RES\t12222-aa-b1-12-alsdkjfh-salkdfjhas-kalaskdjfh\t<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><data><name>Belgian Waffles</name><price>$5.95</price></data>"
},
"log" => {
"file" => {
"path" => "/Users/sbrown/workspace/sample-data/discuss/mixed-text-xml/mixed-txt-xml.txt"
}
},
"msg_details" => "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><data><name>Belgian Waffles</name><price>$5.95</price></data>"
}
{
"@version" => "1",
"xml_data" => {
"price" => "$4.50",
"name" => "French Toast"
},
"@timestamp" => 2022-05-30T17:08:41.438604Z,
"request_type" => "RES",
"message" => "RES\t12223-aa-b1-12-sakdjfhsaldkjf-lsakdjfhsaldfkjh\t<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><data><name>French Toast</name><price>$4.50</price></data>",
"request_id" => "12223-aa-b1-12-sakdjfhsaldkjf-lsakdjfhsaldfkjh",
"type" => "xml",
"host" => {
"name" => "hyperion.local"
},
"event" => {
"original" => "RES\t12223-aa-b1-12-sakdjfhsaldkjf-lsakdjfhsaldfkjh\t<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><data><name>French Toast</name><price>$4.50</price></data>"
},
"log" => {
"file" => {
"path" => "/Users/sbrown/workspace/sample-data/discuss/mixed-text-xml/mixed-txt-xml.txt"
}
},
"msg_details" => "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><data><name>French Toast</name><price>$4.50</price></data>"
I am sure you can figure it out from here