How do I write parsers for multiple lines like this?

Hi Team,

Can someone please help me here for creating parsers? Well my main confusion is and just need a hint how do I parse those multiple lines?

########################
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] connect
[2018-07-27 12:52:33] [3307] [http_80_tcp 3366] [192.168.44.1:60664] connect
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] recv: GET /test/test.exe HTTP/1.1
[2018-07-27 12:52:33] [3307] [http_80_tcp 3367] [192.168.44.1:60665] connect
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] recv: Host: 192.168.44.44
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] recv: Connection: keep-alive
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] recv: Upgrade-Insecure-Requests: 1
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] recv: User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] recv: Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/;q=0.8
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] recv: Accept-Encoding: gzip, deflate
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] recv: Accept-Language: en-US,en;q=0.9,mr;q=0.8
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] info: Request URL: http://192.168.44.44/test/test.exe
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] info: Sending fake file configured for extension 'exe'.
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] send: HTTP/1.1 200 OK
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] send: Content-Length: 24576
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] send: Content-Type: x-msdos-program
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] send: Date: Fri, 27 Jul 2018 07:22:33 GMT
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] send: Connection: Close
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] send: Server: Microsoft-IIS/6.0
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] info: Sending file: /var/lib/inetsim/http/fakefiles/sample_gui.exe
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] stat: 1 method=GET url=http://192.168.44.44/test/test.exe sent=/var/lib/inetsim/http/fakefiles/sample_gui.exe postdata=
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] disconnect

########################

How do I start parsing those logs then? Like Contain words recv, send, info?

TIA
Blason R

Hi Blason,

Why don't you break this down? For example, you can write a grok similar to this:

\[%{DATA:datetime}\] \[%{DATA:id}\] \[%{DATA:policy}\] \[%{HOSTPORT:ipport}\] %{WORD:action}: %{GREEDYDATA:extra}

Then you can parse the value in the extra field using a combination of either conditionals or kv. Let me know if this works!

PS: If the initial field format is fixed, using dissect would give you better performance. I have written it in Grok as i was not sure.

hi @nerdsec thanks for the reply well I am only interested in certain lines and not all the lines, specifically

[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] recv: GET /test/test.exe HTTP/1.1
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] info: Request URL: http://192.168.44.44/test/test.exe
[2018-07-27 12:52:33] [3307] [http_80_tcp 3365] [192.168.44.1:60663] stat: 1 method=GET url=http://192.168.44.44/test/test.exe sent=/var/lib/inetsim/http/fakefiles/sample_gui.exe postdata=

So in logstash how do I start the parsers so it will match lines containing recv and info? and then parameters of those? That is the confusing part for me.

Had it been a single line I would have written a single statment but since its different line not sure how do I start which has word recv and info.

You can use a conditional to check if a value exists in the message field:

if [message] =~ /.*\s(recv:|info:|stat:)\s.*/ {
  parse the data
}
else
{
  drop {}
}

More details about this are here:
https://www.elastic.co/guide/en/logstash/current/event-dependent-configuration.html

Also, you could actually use the parser i mentioned earlier and then apply a conditional. Not sure, which is more efficient. Maybe someone else can help for the efficiency.

if [action] in ["stat", "info", "recv"] {
  do stuff
}
else {
  drop {}
}

I am sorry, i am not quite sure what you mean by this. Maybe an example would help.

Hi,

above example where message contains is the correct one I believe and what I mean by single line is - if that was a single unform logs I could have parsed those but since all those were different lines was not sure how to pass those.

Let me try parsing the data that way and see.

Sure! Let me know if you need any help. :slight_smile:

Well I did configure the parsers for the same and here are those however after that I am only interested in certain lines and not all but dont have a clue how to proceed?

filter {

    grok {
            match => { "message" => "(\[%{TIMESTAMP_ISO8601:logdate}\] \[%{DATA:pid}\] \[%{DATA:port} %{DATA:data}\] \[%{IPV4:clientipaddr}\:%{WORD:sport}\] %{WORD:action}\: %{GREEDYDATA:rest_message})"
               }
        }

}

output {

    elasticsearch {
            index => "logstash-deception-%{+YYYY.MM.dd}"
            hosts => ["http://localhost:9200"]
            }
    }

So these are matching the logs till and after that its just a GREEDYDATA; i wanted to fitler certain lines from those and parse those.

Considering the grok expression you have written, a basic data structure would be as follows:

{
  "data": "3365",
  "port": "http_80_tcp",
  "clientipaddr": "192.168.44.1",
  "logdate": "2018-07-27 12:52:33",
  "action": "recv",
  "pid": "3307",
  "rest_message": "GET /test/test.exe HTTP/1.1",
  "sport": "60663"
}

Considering the sample logs that you have provided, the information in the rest_message would not be consistent across multiple lines as the string has different values based on the context. But looking closely, there seems to be key-value pairing between the values (based on ":"). Would using a kv filter solve the issue? Here is an example:

filter {
  kv {
    value_split => ":"
  }
}

https://www.elastic.co/guide/en/logstash/current/plugins-filters-kv.html

The above mentioned logic would fail only for string with the response codes and URI requested messages. Are there any other such scenarios in your logs? You could use conditionals to find out such scenarios and take appropriate actions.

Hi there,

Thanks for the pointer and here are the complete logs from connect TO disconnect. I am only interested in certain lines may be the logs which contains Info: and then parse those.
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] connect
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] recv: GET /test1/test2/test3.pdf HTTP/1.1
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] recv: Host: 000cas.info
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] recv: Connection: keep-alive
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] recv: Upgrade-Insecure-Requests: 1
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] recv: User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko
) Chrome/67.0.3396.99 Safari/537.36
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] recv: Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/;q=0
.8
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] recv: Accept-Encoding: gzip, deflate, br
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] recv: Accept-Language: en-US,en;q=0.9
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] info: Request URL: https://000cas.info/test1/test2/test3.pdf
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] info: Sending fake file configured for extension 'pdf'.
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] send: HTTP/1.1 200 OK
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] send: Content-Type: application/pdf
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] send: Server: Microsoft-IIS/8.0
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] send: Date: Tue, 31 Jul 2018 13:07:35 GMT
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] send: Content-Length: 37053
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] send: Connection: Close
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] info: Sending file: /var/lib/inetsim/http/fakefiles/sample.pdf
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] stat: 1 method=GET url=https://000cas.info/test1/test2/test3.pdf sent=/var/lib/inetsim/http/fakef
iles/sample.pdf postdata=
[2018-07-31 18:37:35] [31771] [https_443_tcp 704] [192.168.1.33:27025] disconnect

Well in that case, I would say you have already parsed your logs, right?

The data after info field is quite generic and can be kept as it is.

Yep right!! But keen to know I need logs only from Info: field rest everything should be dropped.

Refer this post. Remove the stat and recv part, and keep only info.

The drop filter will remove all logs that you do not need.

great thnks.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.