"hash=event['field'].to_hash" instead "hash = event.to_hash"


(Saket Kumar) #1

Is it possible...

I want to traverse through events against specific field. something like:
hash = event['field1'].to_hash
hash.each { |event['field1'], v| puts event['field1'] if v == hash.values.max }

Any help?


How to get maximum value of field in events?
(Magnus Bäck) #2

The named parameters passed to the iteration block must be identifiers and can't be expressions. What's wrong with

field1_hash = event['field1'].to_hash
field1_hash.each { |k, v| puts event['field1'] if v == field1_hash.values.max }

or even

field1_hash = event['field1'].to_hash
field1_hash.each_value { |v| puts event['field1'] if v == field1_hash.values.max }

since you don't appear to care about the hash key?


(Saket Kumar) #3

I am getting below exception when using it..
Exception in filterworker {"exception"=>#NoMethodError: undefined method to_hash' for 31:Fixnum, "backtrace"=>["(ruby filter code):2:inregister'", "org/jruby/RubyProc.java:271:in call'", "/opt/Log/logstash-1.5.0/vendor/bundle/jruby/1.9/gems/logstash-filter-ruby-0.1.5/lib/logstash/filters/ruby.rb:37:infilter'", "/opt/Log/logstash-1.5.0/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.0-java/lib/logstash/filters/base.rb:162:in multi_filter'", "org/jruby/RubyArray.java:1613:ineach'", "/opt/Log/logstash-1.5.0/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.0-java/lib/logstash/filters/base.rb:159:in multi_filter'", "(eval):302:infilter_func'", "/opt/Log/logstash-1.5.0/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.0-java/lib/logstash/pipeline.rb:219:in filterworker'", "/opt/Log/logstash-1.5.0/vendor/bundle/jruby/1.9/gems/logstash-core-1.5.0-java/lib/logstash/pipeline.rb:156:instart_filters'"], :level=>:error}


(Magnus Bäck) #4

The field1 field obviously isn't a hash or something that can be converted to a hash, it's a numerical value.

With more information about what your messages look like and what you want in the end it'll be easier to help.


(Saket Kumar) #5

Okay let me explain what I want step by step:

  1. XML file contains following fields:
    Array Fields:
    xpath => ["/response/data/run/firstView/videoFrames/frame[]/time/text()","Time_FV"]
    xpath => ["/response/data/run/firstView/videoFrames/frame[
    ]/image/text()","Image_FV"]
    xpath => ["/response/data/run/firstView/videoFrames/frame[*]/VisuallyComplete/text()","Progress_FV"]
    and Non Array fields
    xpath => ["/response/data/median/firstView/visualComplete/text()","FV_visualComplete"]
    xpath => ["/response/data/median/firstView/lastVisualChange/text()","FV_lastVisualChange"]
    xpath => ["/response/data/median/firstView/loadTime/text()","FV_loadTime"]
    xpath => ["/response/data/median/firstView/fullyLoaded/text()","FV_fullyLoaded"]
    xpath => ["/response/data/median/firstView/SpeedIndex/text()","FV_SpeedIndex"]
  2. I have multiple XML files with above fields
  3. I parsed them and output to ELK
  4. When I query found file processed as a new message and for non array fields it is fine.
  5. If you remember in my last blog i had asked about splitting the array field; it was for the same kind of processing. BTW for single file i was able achieve what i wanted from these array field.
    But when you have multiple file the same config will not work. For you reference Config for Single XML File.
    input {
    file {
    path => "/opt/Log/WebPageTestfinal1/RUN*/*_XML_WebpageSummary.xml"
    start_position => "beginning"
    }
    }

filter {
if [message] =~ "^<?xml .*" {
drop {}
}
multiline {
pattern => "^</response>"
negate => true
what => "next"
}

xml {
source => "message"
target => "videoFrames"
store_xml => false
xpath => [

"/response/data/run/firstView/videoFrames/frame[*]/time/text()","Time_FV",
"/response/data/run/firstView/videoFrames/frame[*]/image/text()","Image_FV",
"/response/data/run/firstView/videoFrames/frame[*]/VisuallyComplete/text()","Progress_FV",

"/response/data/run/repeatView/videoFrames//frame[*]/time/text()","Time_RV",
"/response/data/run/repeatView/videoFrames/frame[*]/image/text()","Image_RV",
"/response/data/run/repeatView/videoFrames/frame[*]/VisuallyComplete/text()","Progress_RV",
		
]
	}

ruby {
	code => "
		##Finding max for splitting event for that many number of times
		
		x= [event['Time_FV'].length, event['Time_RV'].length]
		max= x.max
		if event['Time_FV'].length==max
			event['flag']='FV'
		end
		if event['Time_RV'].length==max
			event['flag']='RV'
		end
					
		"
		
    }
	if [flag]=="FV" {
		split { field => "Time_FV" } 
		} 
	if [flag]=="RV" {
		split { field => "Time_RV" } 
		}	


ruby {
	code => "
			my_variable = ENV['mycount3']
			if my_variable.nil?
				ENV['mycount3']=0.to_s
				counter=0
			else
				counter=ENV['mycount3'].to_i
				counter=counter+1
				ENV['mycount3']=counter.to_s
			end

			if event['flag'] != 'FV'
				tfv=event['Time_FV']
				event['TimeFV']=tfv[counter]
			end
			if event['flag'] != 'RV'	
				trv=event['Time_RV']
				event['TimeRV']=trv[counter]
			end
			pfv=event['Progress_FV']
			prv=event['Progress_RV']	
							
			event['ProgressFV']=pfv[counter]
			event['ProgressRV']=prv[counter]
			
			##Extracting test id, run id and url from file name	
            filename = File.basename(event['path'], '.*')
			value = filename.split('_')
			event['Test_Id'] = value[0]
			event['Run_Id'] = value[1] + '_' + value[2] + '_' + value[3]
			event['URL1'] = value[4]			
							
			"			
			}

if [flag]=="FV" {
		mutate {rename => { "Time_FV" => "TimeFV" }}
		} 
if [flag]=="RV" {
		mutate {rename => { "Time_RV" => "TimeRV" }}
		}	 
mutate {convert => ["ProgressFV", "integer"]}
mutate {convert => ["TimeFV", "float"]}
mutate {convert => ["ProgressRV", "integer"]}
mutate {convert => ["TimeRV", "float"]}




#mutate { remove_field => ["Time_FV"]}
#mutate { remove_field => ["Time_RV"]}
#mutate { remove_field => ["Progess_FV"]}
#mutate { remove_field => ["Progess_RV"]}

}

output {
elasticsearch {
action => "index"
host => "172.27.155.109"
index => "logstash-xml1%{+YYYY.MM.dd}"
workers => 1
}
stdout { codec => json }
}

indent preformatted text by 4 spaces

(Magnus Bäck) #6

If you remember in my last blog i had asked about splitting the array field; it was for the same kind of processing. BTW for single file i was able achieve what i wanted from these array field.
But when you have multiple file the same config will not work.

So this is what you're really asking about? If so, please explain what "will not work" means. If not, please explain what your question is.


(Saket Kumar) #7

Please excuse me for being not so clear:

trying to explain consider that i have two xml with fields
File1:

<?Median> <?visulaComplete.... <?Average> <?visulaComplete.... <?Run> <?ID <?FV.. <?time>0<?/time> <?time>200<?/time> <?RV <?time>300<?/time> File2: Is replica <?Median> <?visulaComplete.... <?Average> <?visulaComplete.... <?Run> <?ID <?FV.. <?time>0<?/time> <?time>200<?/time> <?RV <?time>300<?/time> --When parsed I get two new messages --For "visulaComplete" File it is fine and I am able to store them as I want them to be presented on Kibana Graph. -- "Time" field creates an array and stores all values in single field "Time_FV" & "Time_RV" from both parsed files. To present it on graph not seeing any distinction due to its array field. - For seeing the distinction I need to split these array values into different messages (Each split message containing "Time_FV" & "Time_RV" field along with single items from array] This is what i was trying to achieve. Basically doing data modelling for storing data in a way to draw desired graph. As per above config: using RUBY code I got success but complexity increased when having multiple files to process. Hope I am clear to you now.

(Saket Kumar) #8

Can Logstash suffice my need?


(Magnus Bäck) #9

I don't see why the number of input files would in any way matter here. What isn't working? Why is it more complex to support more files?


(Saket Kumar) #10

When file path changes it overwrites the values...


(Magnus Bäck) #11

Sorry, I don't understand what you mean. What values are overwritten?


(Saket Kumar) #12

Is it possible to process each file sequentially....

I mean file1 as an input parse----filter processing ----output elasticsearch; then another file2 as an input parse----filter processing ----output elastic search

rather doing
dir path as an file input ---parse both the file----filter processing of both the file and then ---output elasticsearch.

can we control currently my config parses all files prsent in DIR:
input {
file {
path => "/opt/Log/WebPageTestfinal7/RUN*/*_XML_WebpageSummary.xml"
start_position => "beginning"
}
}

then runs processing on all messages in one go and then output to elasticsearch.

Is it possible to parse-> process->output files one by one ? If yes then how? If i parse file one by one my existing config will work.


(Magnus Bäck) #13

You could run separate Logstash instances to completely separate the processing.

But again, what you're describing doesn't sound normal. With more information about what values are being overwritten we can help you.


(Saket Kumar) #14

Am I being so difficult to state problem...my bad.

Use case:
I have WebPageTest Results for multiple URL Performance Tests and for Multiple Runs.

I get them in Folder Structure

WebPageTest/RUN1432621877157
WebPageTest/RUN1432621608713

XML Data:

Note: Please change file extension from png to xml.

I want to push them into Elasticsearch using same INDEX in such a way to get below charts at Kibana.

                            1             Line chart Visual Progress (FV & RV)
                            2             Bar chart Timings (FV & RV) (Median)
                                                            -Visually Complete 
                                                            -Last Visual Change
                                                            -Load Time (onload)
                                                            -Load Time (Fully Loaded)
                                                            -Speed Index
                                                            -Time to First Byte
                                                            -Time to Title
                            3             Score Board | bar Chart (FV &RV) (Median)
                                            <?score_cache>71</?score_cache>
                                            <?score_cdn>28</?score_cdn>
                                            <?score_gzip>100</?score_gzip>
                                            <?score_cookies>-1</?score_cookies>
                                            <?score_keep-alive>100</?score_keep-alive>
                                            <?score_minify>-1</?score_minify>
                                            <?score_combine>100</?score_combine>
                                            <?score_compress>-1</?score_compress>
                                            <?score_etags>-1</?score_etags>
                            4             Bar chart on CPU consumtion (FV &RV) (Median)
                                            <?docCPUms>3057.62</?docCPUms>
                                            <?fullyLoadedCPUms>4976.432</?fullyLoadedCPUms>
                                            <?docCPUpct>85</?docCPUpct>
                                            <?fullyLoadedCPUpct>70</?fullyLoade

"For chart 2 to 4 Logstash worked perfectly to parse my data and push to ELK"

Using Below Config.

Note: Please change file extension from png to conf.

In attached XML Time FirstView & RepeatView data are coming from array elements hence when parsing all values from respective fields being stored as list in ELK.
Being in List not sufficing desired chart as expecting variations of Progress against time for both FirstView & RepeatView.

For Data Modelling I need to parse the RUN files one by one which i can see as work around rather parsing all run files and processing them and pushing to ELK at once.

Is there any suggestions for handling/processing mutiple XML files or Best Practice.

Thanks Saket.


(Saket Kumar) #15

Is there any way to share files with you for reference and better understanding of problem.


(Magnus Bäck) #16

You're still not explaining what you mean when you say that values are being overwritten. Sorry, I'm out of patience here. Maybe someone else help.


(system) #17