Multiline conf file to parse log file to elasticsearch


#1

I have this log file: http://dpaste.com/3FE2VNY

I only want to extract certain pieces of information such as date time, and number of events posted. My attempt of putting this into elasticsearch results in hanging of logstash. Not sure what I did wrong as I am new to this.

What I attempted to do was to simply grab all the content in the log file and pass it into elasticsearch. I understand that grok must be used to grab specific parts but I am not at that level just quite yet.

My goal is to extract:

start: Mon Apr 27 13:35:25 2015
finish: Mon Apr 27 13:35:36 2015
number of events posted: 10

Log file:

test_web_events.py: START: Mon Apr 27 13:35:25 2015
# TESTCASE TestWebPost ==================================================
# START TEST METHOD #################################: test_10_post_valid_json
[2015-04-27T13:35:25.657887] HTTP DELETE http://pppdc9prd3net:8080/rastplatz/v1/sink/db?k0=bradford4
{}
HTTP response: 200
0
POSTING event_id b29b6c7c-48cd-4cd9-b3c4-aa0a7edc1f35 to businessevent
Content-Type: text/plain
POSTING event_id 13678af1-3e3a-4a6e-a61c-404eb94b9768 to businessevent
Content-Type: text/plain
POSTING event_id 47b70306-2e7c-4cb2-9e75-5755d8d101d4 to businessevent
Content-Type: text/plain
POSTING event_id 6599cdb2-0630-470d-879d-1130cf70c605 to businessevent
Content-Type: text/plain
POSTING event_id d088ce29-fa0d-4f45-b628-045dba1fd045 to businessevent
Content-Type: text/plain
POSTING event_id 07d14813-b561-442c-9b86-dc40d1fcc721 to businessevent
Content-Type: text/plain
POSTING event_id b6aea24a-5424-4a0f-aac6-8cbaecc410db to businessevent
Content-Type: text/plain
POSTING event_id 016386bd-eac5-4f1c-8afc-a66326d37ddb to businessevent
Content-Type: text/plain
POSTING event_id 6610485d-71af-4dfa-9268-54be5408a793 to businessevent
Content-Type: text/plain
POSTING event_id 92786434-02f7-4248-a77b-bdd9d33b57be to businessevent
Content-Type: text/plain
Posted 10 events
# END TEST METHOD ###################################: test_10_post_valid_json
test_web_events.py: FINISH: Mon Apr 27 13:35:36 2015

conf file:

input {
  file {
    path => "/home/bli1/logstash-1.5.0/tmp/bradfordli2_post.log"
    codec => multiline {
      pattern => "^."
      negate => true
      what => "previous"
    }
  }
}
output {
  elasticsearch { protocol => http host => "127.0.0.1:9200"}
  stdout { codec => rubydebug }
}

(Magnus Bäck) #2

With

multiline {
  pattern => "^."
  negate => true
  what => "previous"
}

it looks like you're trying to join a line with its predecessor unless the line it blank, correct? Are there actually empty lines between each entry? Your example only contains a single entry with no terminating blank line.

Secondly, if this is the last entry in a file it's hard for Logstash to know if it has seen all of the message and should ship what it has collected. See LOGSTASH-512.

Finally, make sure New to logstash: file input and stdout output not working doesn't apply in your case.


#3

Hmmm My intent was to have the whole log file be processed as 1 line. Which is why I used ^. and negate => true and what => "previous". Then I would used regex in grok to extract the pieces I want. The logs that I am creating do not have blank lines. The log file I posted is the ENTIRE log file. I am creating log files like this for debugging purposes. Please let me know if there is anything else I can provide to you. I appreciate the help!


(Magnus Bäck) #4

The file input isn't meant to read entire files so you're sort of fighting the system here. I suggest you configure multiline to join the current line with the previous unless it matches the finish line. That should get Logstash to emit a logical line containing the whole files once it reaches the "test_web_events.py: FINISH: ..." line. I guess you won't get that line included in the message but that's probably not a problem.


(Michael Li Zhou) #5

@magnusbaeck you mentioned empty lines between entries what type of pattern would that be? When I look at my message of an empty line its just "" basically nothing. They are just sprinkled all over my log files. Thanks.

Mike

EDIT: actually figured it out not sure how efficient it is but you can set up another filter that can remove the empty spaces!


Logstash : parse multiple lines, each in one field in a single event
(system) #6