Extract Data from message to display each field as a column in kibana


#1

I want to be able to extract the fields i need from message and be able to select them as their own fields, as well as index everything dynamically using the "clientCode". I have been working on this for the past couple of days and i'm stuck. Help is greatly appriciated
This is my config file:

input {
file {
path => ["C:\logstash-6.2.2\conversation_stats\conversation_stats.json"]
start_position => "beginning"
sincedb_path => "/dev/null"
ignore_older => 0
}
}
filter{
grok{
match=>{"message"=>
"%{DATA:id}
%{DATA:clientCode}
%{DATA:conversationID}
%{INT:employeeID}
%{DATA:entities}
%{DATA:input}
%{DATA:intents}
%{DATA:locale} "}
}
mutate{
gsub =>["message","[:<>.,]",""]
}
if[message]!="(null)"{
json{
source=>"message"
target=>"jmessage"
}
}
mutate{remove_field=>["message"]}

}
output{
stdout {codec=>rubydebug}
elasticsearch{
action =>"index"
hosts =>["localhost:9200"]
index =>"test-%{clientCode}"
}
}

Sample error i'm getting in cmd:

[2018-03-07T11:09:37,402][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"test-%{clientCode}", :_type=>"doc", :_routing=>nil}, #LogStash::Event:0x737c4bbc], :response=>{"index"=>{"_index"=>"test-%{clientCode}", "_type"=>"doc", "_id"=>nil, "status"=>400, "error"=>{"type"=>"invalid_index_name_exception", "reason"=>"Invalid index name [test-%{clientCode}], must be lowercase", "index_uuid"=>"na", "index"=>"test-%{clientCode}"}}}}
{
"tags" => [
[0] "_grokparsefailure",
[1] "_jsonparsefailure"
],
"@timestamp" => 2018-03-07T16:09:36.569Z,
"path" => "C:\logstash-6.2.2\conversation_stats\conversation_stats.json",
"@version" => "1",
"host" => "MRK-06576"
}

Here is sample data from my json file:

{
"_id" : ObjectId("5a21e54533015"),
"clientCode" : "demo",
"conversationId" : "d6416ec0--930f-da9f3215",
"employeeId" : "45",
"entities" : [
{
"entity" : "status",
"location" : [
NumberInt("0"),
NumberInt("2")
],
"value" : "ok",
"confidence" : NumberInt("1")
}
],
"input" : {
"feedback" : {
"feedbackSubject" : "my feedbac",
"feedbackText" : "feedback\nthis is good\nI love this",
"feedbackCategory" : "",
"conversationId" : "d6416ec0--930f-da9f3215",
"conversationText" : "(HI) [Greetings, human.]",
"conversationNodeName" : "root"
}
},
"intents" : [
{
"intent" : "feedbackresponse",
"confidence" : NumberInt("1")
}
],
"locale" : "en-ca"
}


#2

If the JSON is all on one line then the following is all you need to parse it. If it is not all on one line then there are lots of threads that discuss how to use multiline codecs.

  mutate {
    gsub => [ "message", 'NumberInt\("([0-9]+)"\)', "\1" ]
    gsub => [ "message", 'ObjectId\("([a-z0-9]+)"\)', '"\1"' ]
  }
  json { source => "message" }

#3

I tried using what you suggested but now it gives me a grok parse error and a json parse error


#4
  1. Why are you using grok?
  2. Can you show the rubydebug output?

#5

i'm kinda new to this and grok seemed like the best fit for me, i'm using it to grab the data from my json file
and here is the output of ruby debug:

{
"tags" => [
[0] "_grokparsefailure"
],
"host" => "MRK-06576",
"@timestamp" => 2018-03-07T20:35:27.624Z,
"@version" => "1",
"message" => "\t"input" : {",
"path" => "C:\logstash-6.2.2\conversation_stats\conversation_stats.json"
}
{
"tags" => [
[0] "_grokparsefailure"
],
"host" => "MRK-06576",
"@timestamp" => 2018-03-07T20:35:27.624Z,
"@version" => "1",
"message" => "\t\t",
"path" => "C:\logstash-6.2.2\conversation_stats\conversation_stats.json"
}


#6

If your input is valid JSON, or even close to it, then a json filter is most likely better than grok. Now, for the sample data you showed in the first post, you need to configure the input so that rubydebug shows the entire JSON object in a single event. Like this...

"message" => "{ \"_id\" : ObjectId(\"5a21e54533015\"), \"clientCode\" : \"demo\", \"conversationId\" : \"d6416ec0--930f-da7aa79f3215\", \"employeeId\" : \"45\", \"entities\" : [ { \"entity\" : \"status\", \"location\" : [ NumberInt(\"0\"), NumberInt(\"2\") ], \"value\" : \"ok\", \"confidence\" : NumberInt(\"1\") } ], \"input\" : { \"feedback\" : { \"feedbackSubject\" : \"my feedbac\", \"feedbackText\" : \"feedback\\nthis is good\\nI love this\", \"feedbackCategory\" : \"\", \"conversationId\" : \"d6416ec0-2f9a-42fb-930f-da7aa79f3215\", \"conversationText\" : \"(HI) [Greetings, human.]\", \"conversationNodeName\" : \"root\" } }, \"intents\" : [ { \"intent\" : \"feedbackresponse\", \"confidence\" : NumberInt(\"1\") } ], \"locale\" : \"en-ca\" }",

as opposed to what you have now, which is

"message" => "\t"input" : {",

If you want to consume the entire file as a single event then you will need to use a multiline code. You can use the trick described here, of appending a line that is known not to occur in the input. Some people will recommend using auto_flush_interval, but personally I think that is an ugly hack.


#7

How would i get the message to display like that using the pattern field, i can't figure out the regex for it. still kinda new to all this.
the sample data i posted above it straight out of the json file i'm working with, the rest of the data in the file is similar.


#8

As I said, if you want to consume multiple lines of JSON from the file as a single event then you will need to use a multiline code.


#9

What i want to be able to see in kibana when i look at all possible fields is things like:
message.input
message.clientCode
message.id
would this be possible with the multiline codec, because i gave it a shot and it didn't work.


#10

Yes, and I linked to a post with an example of doing that.


(system) #11

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.