I am trying to read data from an unstructured data source.
What is the best filter to start with? Grok?
Example:-
The below log details should fall into one document.
LogFile-
2019-06-17T00:00:01 ProcessId[1234] TransactionId[34566] - Processing
<UserDataRequest>
<id>1</id>
---
--
</UserDataRequest>
Processing logs Details - .....
<UserDataResponse>
<id>1</id>
---
---
--
</UserDataResponse>
Badger
June 18, 2019, 5:32pm
2
Use a multiline codec on the input to combine lines. Perhaps
codec => multiline { pattern => "^[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2} " negate => true what => previous auto_flush_interval => 1 }
Then you can parse it using something like
dissect { mapping => { "message" => "%{[@metadata][ts]} ProcessId[%{processId}] TransactionId[%{tranId}]%{}" } }
date { match => [ "[@metadata][ts]", "YYYY-MM-dd'T'HH:mm:ss" ] }
grok {
break_on_match => false
match => {
"message" => [
"(?<[@metadata][request]>\<UserDataRequest>.*\</UserDataRequest>)",
"(?<[@metadata][response]>\<UserDataResponse>.*\</UserDataResponse>)"
]
}
}
xml { source => "[@metadata][request]" target => "request" force_array => false }
xml { source => "[@metadata][response]" target => "response" force_array => false}
Thank you for providing the right way to process my data.
My data log have little more constraint.
If you see below, the request and it's respective response are not sequential which I need to parse the log until I get the matching request and it's response for a given event then send it to log stash.
Aggregrate filter will help here?
2019-06-17T00:00:01 ProcessId[1234] TransactionId[34566] - Processing
<UserDataRequest>
<id>5</id>
---
--
</UserDataRequest>
2019-06-17T00:00:01 ProcessId[1234] TransactionId[34566] - Processing
2019-06-17T00:00:01 ProcessId[1234] TransactionId[34566] - Processing
2019-06-17T00:00:01 ProcessId[1234] TransactionId[34566] - Processing
<UserDataRequest>
<id>6</id>
---
--
</UserDataRequest>
2019-06-17T00:00:01 ProcessId[1234] TransactionId[34566] - Processing
2019-06-17T00:00:01 ProcessId[1234] TransactionId[34566] - Processing
<UserDataResponse>
<id>5</id>
---
---
--
</UserDataResponse>
Badger
June 19, 2019, 12:51pm
4
Yes, you could use an aggregate filter and save the request for a TransactionId in the map until you see the corresponding response.
Thank you for the suggestion.
I am unsuccessful in reading a xml content from the file. The file just have a xml data.
input
{ file{
path => "C:/Ashok/logstash-7.1.1/files/xmlFile.txt"
start_position => "beginning"
sincedb_path => "NUL"
codec => multiline {
pattern => "^<UserDataRequest>"
negate => "true"
what => "previous"
}
}
}
filter{
xml { source => "message" target => "request" force_array => false }
}
output
{
elasticsearch {
hosts => "localhost:9200"
index => "hackxml"
}
stdout{
codec => rubydebug
}
}
Badger
June 20, 2019, 7:03pm
6
That will combine every line that does not start with <UserDataRequest> with the preceding line that does start with <UserDataRequest>. When it see the next line that starts with <UserDataRequest> it will push to the pipeline whatever it has accumulated as an event. In other words, the last <UserDataRequest> in the file never gets pushed. You can fix that using the auto_flush_interval option on the codec.
@Badger I am still confused about the Multiline Codec . How the pattern, negative and what works!!!
I read the documentation but still confused about the previous lines
Badger
June 20, 2019, 9:23pm
8
I doubt I can explain it better than the documentation. I suggest you create some dummy data and experiment with different configurations.
My conf file is
input
{ file{
path => "C:/Ashok/logstash-7.1.1/files/xmlFile.txt"
start_position => "beginning"
sincedb_path => "NUL"
codec => multiline {
pattern => "<UserData>"
negate => true
what => "previous"
auto_flush_interval => 10
}
}
}
filter{
xml { source => "message" target => "request" force_array => false }
}
output
{
elasticsearch {
hosts => "localhost:9200"
index => "hackxml"
}
stdout{
codec => rubydebug
}
}
xml file:-
<UserData>
<name>Adam</name>
<age>30</age>
<address>
<apt>11</apt>
<streetname>Apricot Ave</streetname>
<city>Boston</city>
</address>
</UserData>
Console Output:-
[2019-06-20T14:17:11,000][WARN ][logstash.filters.xml ] Error parsing xml with XmlSimple {:source=>"message", :value=>" <UserData>\r\n\t<name>Adam</name>\r\n\t<age>30</age>\r\n\t<address>\r\n\t\t<apt>11</apt>\r\n\t\t<streetname>Apricot Ave</streetname>\r\n\t\t<city>Boston</city>\r\n\t</address>\r", :exception=>#<REXML::ParseException: No close tag for /UserData
Line: 8
Position: 153
Last 80 unconsumed characters:
>, :backtrace=>["uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/parsers/treeparser.rb:28:in `parse'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/document.rb:288:in `build'", "uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rexml/document.rb:45:in `initialize'", "C:/Ashok/logstash-7.1.1/vendor/bundle/jruby/2.5.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:971:in `parse'", "C:/Ashok/logstash-7.1.1/vendor/bundle/jruby/2.5.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:164:in `xml_in'", "C:/Ashok/logstash-7.1.1/vendor/bundle/jruby/2.5.0/gems/xml-simple-1.1.5/lib/xmlsimple.rb:203:in `xml_in'", "C:/Ashok/logstash-7.1.1/vendor/bundle/jruby/2.5.0/gems/logstash-filter-xml-4.0.7/lib/logstash/filters/xml.rb:185:in `filter'", "C:/Ashok/logstash-7.1.1/logstash-core/lib/logstash/filters/base.rb:143:in `do_filter'", "C:/Ashok/logstash-7.1.1/logstash-core/lib/logstash/filters/base.rb:162:in `block in multi_filter'", "org/jruby/RubyArray.java:1792:in `each'", "C:/Ashok/logstash-7.1.1/logstash-core/lib/logstash/filters/base.rb:159:in `multi_filter'", "org/logstash/config/ir/compiler/AbstractFilterDelegatorExt.java:115:in `multi_filter'", "C:/Ashok/logstash-7.1.1/logstash-core/lib/logstash/java_pipeline.rb:235:in `block in start_workers'"]}
{
"@version" => "1",
"@timestamp" => 2019-06-20T21:17:10.483Z,
"tags" => [
[0] "multiline",
[1] "_xmlparsefailure"
],
"host" => "L-SJL-11016089",
"path" => "C:/Ashok/logstash-7.1.1/files/xmlFile.txt",
"message" => " <UserData>\r\n\t<name>Adam</name>\r\n\t<age>30</age>\r\n\t<address>\r\n\t\t<apt>11</apt>\r\n\t\t<streetname>Apricot Ave</streetname>\r\n\t\t<city>Boston</city>\r\n\t</address>\r"
}
Badger
June 20, 2019, 9:34pm
11
Are you sure there is a line terminator on the </UserData> liine? Try adding a blank line at the end of file to be certain.
Perfect! After adding the blank line, it started working.
However, I am bit confused about the message field and request field in the below response.
Is it possible to save the XML as a XML content in a field? Rather than in JSON format!
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "hackxml",
"_type" : "_doc",
"_id" : "vWTTdmsBqOSNlZHlA_vL",
"_score" : 1.0,
"_source" : {
"@timestamp" : "2019-06-20T21:37:48.559Z",
"path" : "C:/Ashok/logstash-7.1.1/files/xmlFile.txt",
"@version" : "1",
"host" : "L-SJL-11016089",
"message" : "\r"
}
},
{
"_index" : "hackxml",
"_type" : "_doc",
"_id" : "vmTTdmsBqOSNlZHlLfum",
"_score" : 1.0,
"_source" : {
"host" : "L-SJL-11016089",
"tags" : [
"multiline"
],
"@timestamp" : "2019-06-20T21:37:59.078Z",
"path" : "C:/Ashok/logstash-7.1.1/files/xmlFile.txt",
"@version" : "1",
"message" : """
<UserData>
<name>Adam</name>
<age>30</age>
<address>
<apt>11</apt>
<streetname>Apricot Ave</streetname>
<city>Boston</city>
</address>
</UserData>
""",
"request" : {
"name" : "Adam",
"age" : "30",
"address" : {
"apt" : "11",
"city" : "Boston",
"streetname" : "Apricot Ave"
}
}
}
}
]
}
}
Badger
June 20, 2019, 9:47pm
13
You are looking at the data returned by elasticsearch, which is always JSON.
system
(system)
Closed
July 18, 2019, 9:47pm
14
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.