Unable to process the file input which has Key Value pairs


#1

Hello,

My input file is a single line file which has multiple key Value pairs in it. Of them the field or Key “content” is the only one that is needed. I can drop the rest. The key/field “content” has a lot of events and it is those events, I need to send to ES. I have tried with Split, JSON, KV and also GROK without any success. When enabling debug mode, I see the “globbed files” entry on my input files. No issue with any permissions. Could not get any output so far. Any assistance is much appreciated.

Raw data >

{"queryId":-1,"last":false,"first":true,"totalElements":17804445,"numberOfElements":10,"totalPages":1780445,"size":10,"content":[{"id":"657261236","type":"BEV","system":["SST"],"actualStart":6017544370277667464,"actualEnd":1517289061000,"updated":1517289061601,"relevantStart":6017544370277667464,"relevantEnd":1517289061000,"apr":null,"status":"stop","origin":null,"title":"runTokCvv45 stop","peak":[],"risk":null,"severity":null,"owner":null,"summary":null,"irrLink":null,"investigators":[]},{"id":"55ab96b31","type":"BEV","system":["NON"],"actualStart":55237046401000,"actualEnd":1517601601000,"updated":1517601601051,"relevantStart":55237046401000,"relevantEnd":1517601601000,"apr":null,"status":"malware  start","origin":null,"title":"malware  start","peak":[],"risk":null,"severity":null,"owner":null,"summary":null,"imrPmrLink":null,"investigators":[]}],"number":0,"_links":{"self":{"href":"http://abc/abc?page=0&queryId=-1&size=10"},"next":{"href":"http://abc/abc?page=1&queryId=-1&size=10"},"first":{"href":"http://abc/abc?page=0&queryId=-1&size=10"},"last":{"href":"http://abc/abc?page=1780444&queryId=-1&size=10"}}}

Logstash config >

   input{
   	file {
   		path => "/home/admin//testbare.txt"
   		start_position => "beginning"
   		sincedb_path => "/home/admin/lsin/db"
   	}
   }
   filter { 
   	json { source => "message" }
   		kv { 
   		value_split => ":"
   		field_split => ","
   		include_keys => [ "content" ]
   		recursive => "true"
   		}
   }

   output {
   	stdout { codec => rubydebug }
   }

Regards


(Magnus Bäck) #2

Have you tried shutting down Logstash, deleting /home/admin/lsin/db, and starting it up again?

Don't use the kv filter. Use a json filter (or configure the file input to use the json codec), then use a split filter on the content field. What do you get?


#3

Hi Magnus,

Thanks for the response. Yes. I was always deleting the since DB before trying the various methods. I have tried the one you have proposed earlier, but here you go. I am attaching the debug logfile and the config.
Just to let you know, I was using this combination (JSON and Split) earlier as I was making a HTTP Call to get this data. It worked fine using the http_poller as the input. However, I had to move to a file based option for various reasons and was surprised to see it stuck.

input{
	file {
		path => "/home/admin/elastico/dataInput/ev/gev/tstelas.txt"
		start_position => "beginning"
		sincedb_path => "/home/admin/elastico/dataInput/lsin/db"
		codec => json
	}
}
filter { 
		split {
			field => "content"
		}

}

output {
	elasticsearch {
		"hosts" => ["localhost:9200"]
		"index" => ["eventview"]
	}
	stdout { codec => rubydebug }
}

Log file extract of globbed files:

[2018-02-05T13:11:51,460][DEBUG][logstash.inputs.file     ] each: file grew: /home/admin/elastico/dataInput/ev/gev/tstelas.txt: old size 0, new size 1126
[2018-02-05T13:11:52,461][DEBUG][logstash.inputs.file     ] each: file grew: /home/admin/elastico/dataInput/ev/gev/tstelas.txt: old size 0, new size 1126
[2018-02-05T13:11:52,462][DEBUG][logstash.inputs.file     ] _globbed_files: /home/admin/elastico/dataInput/ev/gev/tstelas.txt: glob is: ["/home/admin/elastico/dataInput/ev/gev/tstelas.txt"]
[2018-02-05T13:11:53,288][DEBUG][logstash.pipeline        ] Pushing flush onto pipeline
[2018-02-05T13:11:53,463][DEBUG][logstash.inputs.file     ] each: file grew: /home/admin/elastico/dataInput/ev/gev/tstelas.txt: old size 0, new size 1126
[2018-02-05T13:11:54,465][DEBUG][logstash.inputs.file     ] each: file grew: /home/admin/elastico/dataInput/ev/gev/tstelas.txt: old size 0, new size 1126
[2018-02-05T13:11:55,466][DEBUG][logstash.inputs.file     ] each: file grew: /home/admin/elastico/dataInput/ev/gev/tstelas.txt: old size 0, new size 1126

Please let me know in case you need any other data part in the log file.

In case it helps, the bare skeleton of the message is the following

{"queryId":-1,"last":false,"first":true,"totalElements":17804445,"numberOfElements":10,"totalPages":1780445,"size":10,"content":[],"number":0,"_links":{}}

"content" has an array which has various key value pairs and links has standard 4 values,

Best regards


#4

In order to get a bit more visibility, I have used the stdin and stdout plugins. Surprisingly, it works fine.

input{
	stdin { codec => json }
}

filter { 
		split {
			field => "content"
		}

}

output {
	
	stdout { codec => rubydebug }
}

So, I can confirm it works fine with the input plugins stdin and http_poller, but it is clearly failing with the file plugin. Unfortunately that is the one that I need in my setup.


(Magnus Bäck) #5

So what's different with the different inputs? Comment out the split filter. What do you get from the stdout output? Replace the stdin input with a file input. What do you get from the stdout output?


#6

stdin was fine and http_poller was also fine.
I changed the input to the file type and nothing appears in the stdout. I enabled the debug and the logs is similar to what I have pasted above.

To see if there is an issue with my file input plugin, I have tried to use another custom log file and that has processed successfully.

Any chance that the raw data above can be tested to be processed using a file input plugin?

I shall install logstash tomorrow on a Windows server to see if that makes any difference.

Best regards


#7

Today after the Logstash was started, I have reopened the file that it was expected to process and save it, without any changes into it and then the file was processed. I have generated a new file fitting that pattern and Logstash did not process. The moment I open the file and save it, without making any updates, Logstash picks the file and processes it. Tried commenting the beginning and since_db, but the behavior was the same.

best regards


#8

Finally found the issue. It is the script that is creating the file, not adding the end of line character. So, the file input plug in was noticing the file growing, but was not doing anything as it was missing the end of the line.

This can be closed.


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.