Blason
(R)
July 27, 2018, 7:09am
1
Hi Team,
Can someone please help me here for creating parsers? Well my main confusion is and just need a hint how do I parse those multiple lines?
########################
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] connect
[2018-07-27 12:52:33] [3307] [http_80_tcp 3366] [192.168.44.1:60664] connect
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] recv: GET /test/test.exe HTTP/1.1
[2018-07-27 12:52:33] [3307] [http_80_tcp 3367] [192.168.44.1:60665] connect
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] recv: Host: 192.168.44.44
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] recv: Connection: keep-alive
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] recv: Upgrade-Insecure-Requests: 1
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] recv: User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] recv: Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/ ;q=0.8
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] recv: Accept-Encoding: gzip, deflate
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] recv: Accept-Language: en-US,en;q=0.9,mr;q=0.8
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] info: Request URL: http://192.168.44.44/test/test.exe
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] info: Sending fake file configured for extension 'exe'.
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] send: HTTP/1.1 200 OK
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] send: Content-Length: 24576
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] send: Content-Type: x-msdos-program
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] send: Date: Fri, 27 Jul 2018 07:22:33 GMT
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] send: Connection: Close
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] send: Server: Microsoft-IIS/6.0
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] info: Sending file: /var/lib/inetsim/http/fakefiles/sample_gui.exe
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] stat: 1 method=GET url=http://192.168.44.44/test/test.exe sent=/var/lib/inetsim/http/fakefiles/sample_gui.exe postdata=
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] disconnect
########################
How do I start parsing those logs then? Like Contain words recv, send, info?
TIA
Blason R
NerdSec
(Nachiket)
July 27, 2018, 8:57am
2
Hi Blason,
Why don't you break this down? For example, you can write a grok similar to this:
\[%{DATA:datetime}\] \[%{DATA:id}\] \[%{DATA:policy}\] \[%{HOSTPORT:ipport}\] %{WORD:action}: %{GREEDYDATA:extra}
Then you can parse the value in the extra
field using a combination of either conditionals or kv. Let me know if this works!
PS: If the initial field format is fixed, using dissect would give you better performance. I have written it in Grok as i was not sure.
Blason
(R)
July 27, 2018, 11:51am
3
hi @nerdsec thanks for the reply well I am only interested in certain lines and not all the lines, specifically
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] recv: GET /test/test.exe HTTP/1.1
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] info: Request URL: http://192.168.44.44/test/test.exe
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] stat: 1 method=GET url=http://192.168.44.44/test/test.exe sent=/var/lib/inetsim/http/fakefiles/sample_gui.exe postdata=
So in logstash how do I start the parsers so it will match lines containing recv and info? and then parameters of those? That is the confusing part for me.
Had it been a single line I would have written a single statment but since its different line not sure how do I start which has word recv and info.
NerdSec
(Nachiket)
July 27, 2018, 12:32pm
4
You can use a conditional to check if a value exists in the message field:
if [message] =~ /.*\s(recv:|info:|stat:)\s.*/ {
parse the data
}
else
{
drop {}
}
More details about this are here:
https://www.elastic.co/guide/en/logstash/current/event-dependent-configuration.html
Also, you could actually use the parser i mentioned earlier and then apply a conditional. Not sure, which is more efficient. Maybe someone else can help for the efficiency.
if [action] in ["stat", "info", "recv"] {
do stuff
}
else {
drop {}
}
NerdSec
(Nachiket)
July 27, 2018, 12:37pm
5
I am sorry, i am not quite sure what you mean by this. Maybe an example would help.
Blason
(R)
July 27, 2018, 12:54pm
6
Hi,
above example where message contains is the correct one I believe and what I mean by single line is - if that was a single unform logs I could have parsed those but since all those were different lines was not sure how to pass those.
Let me try parsing the data that way and see.
NerdSec
(Nachiket)
July 28, 2018, 5:19pm
7
Sure! Let me know if you need any help.
Blason
(R)
July 31, 2018, 5:59am
8
Well I did configure the parsers for the same and here are those however after that I am only interested in certain lines and not all but dont have a clue how to proceed?
filter {
grok {
match => { "message" => "(\[%{TIMESTAMP_ISO8601:logdate}\] \[%{DATA:pid}\] \[%{DATA:port} %{DATA:data}\] \[%{IPV4:clientipaddr}\:%{WORD:sport}\] %{WORD:action}\: %{GREEDYDATA:rest_message})"
}
}
}
output {
elasticsearch {
index => "logstash-deception-%{+YYYY.MM.dd}"
hosts => ["http://localhost:9200"]
}
}
So these are matching the logs till and after that its just a GREEDYDATA; i wanted to fitler certain lines from those and parse those.
NerdSec
(Nachiket)
July 31, 2018, 6:15pm
9
Considering the grok expression you have written, a basic data structure would be as follows:
{
"data": "3365",
"port": "http_80_tcp",
"clientipaddr": "192.168.44.1",
"logdate": "2018-07-27 12:52:33",
"action": "recv",
"pid": "3307",
"rest_message": "GET /test/test.exe HTTP/1.1",
"sport": "60663"
}
Considering the sample logs that you have provided, the information in the rest_message
would not be consistent across multiple lines as the string has different values based on the context. But looking closely, there seems to be key-value pairing between the values (based on ":"). Would using a kv
filter solve the issue? Here is an example:
filter {
kv {
value_split => ":"
}
}
https://www.elastic.co/guide/en/logstash/current/plugins-filters-kv.html
NerdSec
(Nachiket)
July 31, 2018, 6:17pm
10
The above mentioned logic would fail only for string with the response codes and URI requested messages. Are there any other such scenarios in your logs? You could use conditionals to find out such scenarios and take appropriate actions.
Blason
(R)
August 1, 2018, 2:00pm
11
Hi there,
Thanks for the pointer and here are the complete logs from connect TO disconnect. I am only interested in certain lines may be the logs which contains Info: and then parse those.
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] connect
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] recv: GET /test1/test2/test3.pdf HTTP/1.1
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] recv: Host: 000cas.info
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] recv: Connection: keep-alive
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] recv: Upgrade-Insecure-Requests: 1
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] recv: User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko
) Chrome/67.0.3396.99 Safari/537.36
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] recv: Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/ ;q=0
.8
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] recv: Accept-Encoding: gzip, deflate, br
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] recv: Accept-Language: en-US,en;q=0.9
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] info: Request URL: https://000cas.info/test1/test2/test3.pdf
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] info: Sending fake file configured for extension 'pdf'.
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] send: HTTP/1.1 200 OK
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] send: Content-Type: application/pdf
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] send: Server: Microsoft-IIS/8.0
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] send: Date: Tue, 31 Jul 2018 13:07:35 GMT
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] send: Content-Length: 37053
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] send: Connection: Close
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] info: Sending file: /var/lib/inetsim/http/fakefiles/sample.pdf
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] stat: 1 method=GET url=https://000cas.info/test1/test2/test3.pdf sent=/var/lib/inetsim/http/fakef
iles/sample.pdf postdata=
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] disconnect
NerdSec
(Nachiket)
August 1, 2018, 5:29pm
12
Well in that case, I would say you have already parsed your logs, right?
The data after info field is quite generic and can be kept as it is.
Blason
(R)
August 1, 2018, 5:37pm
13
Yep right!! But keen to know I need logs only from Info: field rest everything should be dropped.
NerdSec
(Nachiket)
August 2, 2018, 4:10am
14
NerdSec:
Also, you could actually use the parser i mentioned earlier and then apply a conditional. Not sure, which is more efficient. Maybe someone else can help for the efficiency.
if [action] in ["stat", "info", "recv"] {
do stuff
}
else {
drop {}
}
Refer this post. Remove the stat and recv part, and keep only info.
The drop filter will remove all logs that you do not need.
system
(system)
Closed
September 2, 2018, 7:28am
16
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.