Merging a section of event to single event and splitting with key value filter

I am trying logstash to iterate a build status file and map it in to elastic search so that I could find out how much time each build and test consumes and which platform the builds are happening.

A sample status logs is below.

	Flags:              SYNC@submit@start@push INTEGRATE PRECIOUS PUSHING
	Submitted at:       Tuesday January 26, 2016 08:17:45 UTC
	Time in queue:      01H 22m 36s
	Run time:           01H 16m 23s
	Build (Release:jdk Boot:<jdk>):

	  linux_ubunt64_2.6-fastdebug                        success(13m 36s)
			USED:     hostname=xxxx platform=linux_x64_2.6 osname=linux osarch=x64 cpus=6 
			ATTRS:    distribution=OEL,ubuntu
			TIMING:   clean=10s init=13s work=12m46s fini=27s
			NEEDS:    ant171,antcontrib10b2,ubuntxcompiler,asmtools,ccache,

	  linux_ubuntvfphflt_2.6-productEmb                  success(42m 21s)
			USED:     hostname=abcdc platform=linux_x64_2.6 osname=linux osarch=x64 cpus=4 
			ATTRS:    distribution=OEL,ubuntu
			TIMING:   clean=1s init=28s work=41m18s fini=34s		


	Tests:
	  linux_ubunt64_2.6-fastdebug-default-hotspot_basicvmtest     success(01m 29s)
			USED:     hostname=xxxxx platform=linux_ubunt64_3.13 osname=linux osarch=ubunt64 cpus=8 
			TIMING:   clean=1s init=46s work=28s fini=9s
			NEEDS:    gnumake381_native,jtreg

	  windows_x64_6.3-product-default-hotspot_basicvmtest     success(05m 38s)
			USED:     hostname=abcddf platform=windows_x64_6.2 osname=windows osarch=x64 cpus=4 
			TIMING:   clean=1s init=4m39s work=47s fini=2s
			NEEDS:    cygwin,gnumake381_native,jtreg,


	Platform Statistics

	  linux_ubunt64_2.6        (builds: 2)      (tests: 2)         (total)
			  started:  Tue-08:27:48-UTC  Tue-08:43:59-UTC  Tue-08:27:48-UTC
			 ended:  Tue-08:53:26-UTC  Tue-08:55:17-UTC  Tue-08:55:17-UTC
			  elapsed:         25m 38s         11m 17s         27m 29s

My intention is to get a build execution statistics. I am kind of stuck in which direction I should go. This is what I am trying to do.

  1. Read the status fileusing file input plugin
  2. In filter plugin If the line starts with Build (Release:jdk Boot:) read it as multiline until you encounter "Test:" and then split it with kv pair
  3. In filter plugin if the line start with Test do similar to step 2 until you reach Platform Statistics.
  4. Repeat same for Platform Statistics

I have struggling to make it work through logstash and even thinking about writing a small script to just convert to json instead of the same.

A sample i been trying is below.

 input {
 file {
	path => "D:\softwares\logstash\logstash-2.1.1\bin\JobStatus_mine.txt"
	start_position => "beginning"				
 }
 }

filter{		  
 grok {            
        match => {"message" => ["^Build%{SPACE}\(Release:%{DATA}"]}
		add_tag => "buildstatus"
     }
## This is the logic I was trying to merge all the line until you find "Tests". There is some thing wrong here
if "buildstatus" in [tags] {	
	multiline {
		add_tag => "buildstatuspattern"
		pattern => "^Tests:"
		what => "next"	
		negate => true
		}
	}
if "buildstatuspattern" in [tags]{
	kv {						
		value_split => ":"			
		add_tag => "buildsplitted"
		recursive => true			
		}
	}
   }

Am I in right track Could some one share their suggestion? Really appreciate any help

I wouldn't try to parse this with Logstash. I'm sure it's totally possible, but I suspect it's easier with a separate script (that you can invoke via an exec input).

Hi Magnus,
Thanks alot for your comments. I am now using an external script to parse the files and convert to json. But I am facing one challenge ever time I run the script it scans it ever other time. If we were using logstash File plugin it would not scan the file again unless there is some modification. Due to this it adds overhead and process time each time I run the script. Basically I would like to skip the files if it is already scanned, which I could have achieved using a file plugin. Any suggestions?