Parse Log File To Logstash (Nested Optional)


(יוגב בוקובזה) #1

I need your help.

I'm working at company and now we need to analyse data using ELK. In general, we have few questions and we save each question data such as: cookie id of the client,the answer chosen,how much time it took and more.

I'm using grok debugger and having hard times to parse the log string.

Here's one user answering question data (should be 1 line in our log file):

2018-03-10 21:58:23.0766:  key = cookieID =
5f8f2a36-6444-4371-9075-3c9bb5292c3e value = UserData:
nextQuestionIndex = 8,  Map<Integer, UserAnswer> userAnswersMap: 
id = 0, questionnaireAnsId = 5611, userId = 5579, questionId = 1, answerId[] = 800, timeToAnswer = 5.0 
id = 0, questionnaireAnsId = 5611, userId = 5579, questionId = 2, answerId[] = 3, 4, 7, timeToAnswer = 14.0 
id = 0, questionnaireAnsId = 5611, userId = 5579, questionId = 3, answerId[] = 9, timeToAnswer = 9.0 
id = 0, questionnaireAnsId = 5611, userId = 5579, questionId = 4, answerId[] = 16, timeToAnswer = 12.0 
id = 0, questionnaireAnsId = 5611, userId = 5579, questionId = 5, answerId[] = 18, 21, 23, 24, timeToAnswer = 14.0                       
id = 0, questionnaireAnsId = 5611, userId = 5579, questionId = 6, answerId[] = 30, timeToAnswer = 14.0  
id = 0, questionnaireAnsId = 5611, userId = 5579, questionId = 7, answerId[] = 33, 35, timeToAnswer = 16.0

using this parse pattern:

%{TIMESTAMP_ISO8601:timestamp}: key = cookieID = (?<cookie_ID>\b\w+\b-\b\w+\b-\b\w+\b-\b\w+\b-\b\w+\b) 
value = UserData: nextQuestionIndex = %{INT:nextQuestionIndex},
Map<Integer,UserAnswer> userAnswersMap: 
%{GREEDYDATA:Questions}

I got this JSON:

{
"timestamp": [
[
  "2018-03-10 21:58:23.0766"
]
],
"YEAR": [
[
  "2018"
]
],
"MONTHNUM": [
[
  "03"
]
],
"MONTHDAY": [
[
  "10"
]
],
"HOUR": [
[
  "21",
  null
]
],
"MINUTE": [
[
  "58",
  null
]
],
"SECOND": [
[
  "23.0766"
]
],
"ISO8601_TIMEZONE": [
[
  null
]
],
"cookie_ID": [
[
  "5f8f2a36-6444-4371-9075-3c9bb5292c3e"
]
],
"nextQuestionIndex": [
[
  "8"
]
],
"Questions": [
[
  "id = 0, questionnaireAnsId = 5611, userId = 5579, questionId = 1, answerId[] = 800, timeToAnswer = 5.0 
   id = 0, questionnaireAnsId = 5611, userId = 5579, questionId = 2, answerId[] = 3, 4, 7, timeToAnswer = 14.0 
   id = 0, questionnaireAnsId = 5611, userId = 5579, questionId = 3, answerId[] = 9, timeToAnswer = 9.0 
   id = 0, questionnaireAnsId = 5611, userId = 5579, questionId = 4, answerId[] = 16, timeToAnswer = 12.0 
   id = 0, questionnaireAnsId = 5611, userId = 5579, questionId = 5, answerId[] = 18, 21, 23, 24, timeToAnswer = 14.0
   id = 0, questionnaireAnsId = 5611, userId = 5579, questionId = 6, answerId[] = 30, timeToAnswer = 14.0
   id = 0, questionnaireAnsId = 5611, userId = 5579, questionId = 7, answerId[] = 33, 35, timeToAnswer = 16.0"
]
]
}

now the left data under "Questions" that's the hard part. as I realize from the data, it should be nested. I was thinking that this is how it should look like:

{   
"timestamp": [
[
   "2018-03-10 21:58:23.0766"
]   
],
"YEAR": [
[
   "2018"
]
],
"MONTHNUM": [
[
   "03"
]
],
"MONTHDAY": [
[
   "10"
]
],
"HOUR": [
[
   "21",
   null
]
],
"MINUTE": [
[
   "58",
   null
]
],
"SECOND": [
[
   "23.0766"
]
],
"ISO8601_TIMEZONE": [
[
   null
]
],
"cookie_ID": [
[
   "5f8f2a36-6444-4371-9075-3c9bb5292c3e"
]
],
"nextQuestionIndex": [
[
   "8"
]
],
"Questions": [
{
   "id": 0, "questionnaireAnsId": 5611, "userId": 5579, "questionId": 1, "answerId[]": [800], "timeToAnswer": 5.0   
},
{   "id": 0, "questionnaireAnsId": 5611, "userId": 5579, "questionId": 2,   "answerId[]": [3, 4, 7], "timeToAnswer": 14.0   
},
{   "id": 0, "questionnaireAnsId": 5611, "userId": 5579, "questionId": 3,   "answerId[]": [9], "timeToAnswer": 9.0   
},
{   "id": 0, "questionnaireAnsId": 5611, "userId": 5579, "questionId": 4,   "answerId[]": [16], "timeToAnswer": 12.0   
},
{   "id": 0, "questionnaireAnsId": 5611, "userId": 5579, "questionId": 5,   "answerId[]": [18, 21, 23, 24], "timeToAnswer": 14.0
},
{   "id": 0, "questionnaireAnsId": 5611, "userId": 5579, "questionId": 6,   "answerId[]": [30], "timeToAnswer": 14.0   
},
{   "id": 0, "questionnaireAnsId": 5611, "userId": 5579, "questionId": 7,   "answerId[]": [33,35], "timeToAnswer": 16.0
}]
}

I can use nested elements in Kibana? How do I parse the last part? Do you think about better way that not using nested? (I was thinking on other wast that each question data is in one line and then cookie id copied in every question so I can identify this question to user)

Please I am realy need your help. Thanks.


#2

I would use

    mutate { split => { "Questions" => "
" } }
    mutate { gsub => [ "Questions", "\[\]", "" ] }

to split Questions into an array and strip out the square brackets. Then dive into ruby to do regex matching.

        ruby {
            code => '
                a = []
                event.get("Questions").each { |q|
                    matches = q.scan(/id = (\d+), questionnaireAnsId = (\d+), userId = (\d+), questionId = (\d+), answerId = ([\d ,]+), timeToAnswer = ([\.\d]+)/)
                    a << { "id": matches[0][0], "questionnaireAnsId": matches[0][1], "userId": matches[0][2], "questionId": matches[0][3], "answerId": matches[0][4].split(/[, ]+/), "timeToAnswer": matches[0][5].to_f }
                }
                event.set("doesThisWork", a)
    '
}

Which will get you

     "doesThisWork" => [
    [0] {
                  "answerId" => [
            [0] "800"
        ],
                    "userId" => "5579",
              "timeToAnswer" => 5.0,
                "questionId" => "1",
                        "id" => "0",
        "questionnaireAnsId" => "5611"
    },
    [1] {
                  "answerId" => [
            [0] "3",
            [1] "4",
            [2] "7"
        ],
                    "userId" => "5579",
              "timeToAnswer" => 14.0,
                "questionId" => "2",
                        "id" => "0",
        "questionnaireAnsId" => "5611"
    },

Another approach would be to mutate+gsub each entry in the Questions array to the point where a kv filter would work, but then you still have to dive into ruby to invert the arrays that kv gives you.

Oh, and there is no error handling in the above code. If anything fails in the ruby filter logstash may very well give up and stop. However, it shows a working approach to the problem.


(יוגב בוקובזה) #3

Thanks for your answer!
Do you think this approach is good based on the fact this log file may be change because multiple users can do the questions at the same time with diffrent cookie id.
maybe I need a message that contain only 1 question answerd and then send cookie id on every one of them?

Like this:

{
"timestamp": [
[
"2018-03-10 21:58:23.0766"
]
],
"YEAR": [
[
"2018"
]
],
"MONTHNUM": [
[
"03"
]
],
"MONTHDAY": [
[
"10"
]
],
"HOUR": [
[
"21",
null
]
],
"MINUTE": [
[
"58",
null
]
],
"SECOND": [
[
"23.0766"
]
],
"ISO8601_TIMEZONE": [
[
null
]
],
"cookie_ID": [
[
"5f8f2a36-6444-4371-9075-3c9bb5292c3e"
]
],
"nextQuestionIndex": [
[
"8"
]
],
"Question": [
{
"id": 0, "questionnaireAnsId": 5611, "userId": 5579, "questionId": 1, "answerId[]": [800], "timeToAnswer": 5.0
}]
}

So in my log cookie id will be everytime before question?

Waiting for your answer.


#4

You could use a split filter to create an event for each entry in Questions.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.