I am getting data in Json format in my eleasticsearch. but one one the fields called "messages" has over 30K lines, I need to split lines into the "message" based on the format. i have a special character where i have data between them should be made into different field. the special char is ">"
example data
</>{
"_index": "jenkins-2019.01.08",
"_type": "doc",
"_id": "50ytLmgBIs_AS77pvVhP",
"_version": 1,
"_score": null,
"_source": {
"@version": 1,
"source_host": "http://jenkins.xx.com",
"@timestamp": "2019-01-08T18:14:11.289Z",
"message": [
"Branch event",
"Started by GitLab push by XXXXXXX",
"retrieving head-revision for us31096-at-Gate3-24hr",
"Obtained Jenkinsfile from bbe820800b0199e6e2a63e3e3ae2a8100b84e111",
"Running in level: PERFORMANCE_OPTIMIZED",
"Loading library dsl@1234",
"Attempting to resolve v3.4.5 from mote references...",
" > git --version # timeout=10",
"using GIT_ASKPASS to set credenti Read Repository",
" > git ls-remote -h https://gitlab.xxxxxx.xxxx.xxxx. # timeout=10",
"Could not find v2.3.4 in remote references. Pulling heads to local for deep search...",
" > git rev-parse --is-inside-work-tree # timeout=10",
"Setting origin to https://gitlab.xxx.xxx.xxxx.xx.git",
" > git config remote.origin.url https://gitlab-xxxx-xxxx-xxxx-xx..git # timeout=10",
"Fetching origin...",
"Fetching upstream changes from origin",
" > git --version # timeout=10",
" > git config --get remote.origin.url # timeout=10",
</>
This is just sample data. we can see ">" can we split them into multiple fields for the data between ">"
Not sure what you are asking. In your sample data message appears to be an array of strings, but you say the field is called messages. If you do not specify the problem correctly any offered solution will not work. Are you trying to extract a subset of those strings into another field?
If you want to extract the lines that start with " >" then this would work
ruby {
code => '
a = event.get("messages").keep_if { |x| x.start_with? (" >") }
event.set("commands", a)
'
}
Thanks for your input here. sorry i was not clear before. Here is the situtation,
I have a huge field called "messages" i would like to break that message into smaller fields. i want the data between two ">" to be into smaller fields.
for example
from above sample data "message" contain lot of git commands which starts with ">" and output is in the next line. the next git command starts with ">" and has the output till there is another ">" is found
Now my requirement is
each git command should be field and output of the command should be the value.
OK, so start off with something like this. If it doesn't do quite what you want then you will have to update it.
ruby {
code => '
a = event.get("messages")
if a then
a.each { |x|
if x.start_with? (" >") then
if @oKey then
if event.get(@oKey) then
event.set(@oKey, ( [ event.get(@oKey) ] << @oValue ).flatten)
else
event.set(@oKey, @oValue)
end
else
event.set("preamble", @oValue)
end
@oKey = x
@oValue = ""
else
if ! @oValue then
@oValue = ""
end
if @oValue == "" then
@oValue = x
else
@oValue = @oValue + "\n" + x
end
end
}
if @oKey then
event.set(@oKey, @oValue)
else
event.set("preamble", @oValue)
end
end
'
}
As always, error handling is left as an exercise for the reader. If my Ruby coding style makes your eyeballs bleed then I apologize to your eyeballs.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.