Grok Parse Error, and input to output formatting using grok and json_encode filter

Error in command prompt:
{
"@timestamp" => 2018-03-09T14:18:26.524Z,
"tags" => [
[0] "_grokparsefailure"
],
"json_message" => "input : {"
}
{
"@timestamp" => 2018-03-09T14:18:26.524Z,
"tags" => [
[0] "_grokparsefailure"
],
"json_message" => "text : what is Laura Abimbola contact"
}
{
"@timestamp" => 2018-03-09T14:18:26.524Z,
"tags" => [
[0] "_grokparsefailure"
],
"json_message" => "}"
}

What i want to do:

  1. get rid of the grok parse error

  2. the way my json message is displayed above isn't how i want it. Instead I want it to display as:
    json_message => input :{ "text" : "what is Laura Abimbola contact" }
    for each field within my json_message

  3. when i go to kibana i want it to show me all the fields in json_message as:
    json_message.input
    json_message.id
    etc.

Help would be greatly appreciated as i'm still a newbie when it comes to logstash, elasticssearch, and kibana, Thanks in advance.

The Following is my configuration file:

input {
file {	
path => ["C:\logstash-6.2.2\conversation_stats\conversation_stats.json"]
    start_position => "beginning"
    sincedb_path => "/dev/null"
ignore_older => 0
  }
}
filter{
grok{	
	match=>{"message" => "%{DATA:_id}, \s+%{DATA:clientCode}, \s+%{DATA:conversationID}, \s+%{DATA:employeeID}, \s+%{DATA:entities}, \s+%{DATA:input}, \s+%{DATA:intents}, \s+%{DATA:locale}"}
}

if[message]!="(null)"{
	json_encode{
		source=>"message"
		target=>"json_message"
	}
}	

mutate{remove_field=>["message","path", "host", "@version"]}

mutate{
	gsub=>["json_message","\\t",""]
	gsub=>["json_message","\n",""]
	gsub=>["json_message","[\\]",""]
	gsub=>["json_message","[\",]",""]
}
}
output{
 stdout {codec=>rubydebug}
 elasticsearch{
	action =>"index"
	hosts =>["localhost:9200"]
	index =>"test"
  }
}

Here is some sample json from my json file:

{
"_id" : ObjectId("5a2b18500623f9"),
"clientCode" : "tk",
"conversationId" : "c01b73b6-7055817a661b",
"employeeId" : "3898",
"entities" : [
	{
		"entity" : "benefits",
		"location" : [
			NumberInt("290"),
			NumberInt("209")
		],
		"value" : "insurance",
		"confidence" : NumberInt("10")
	}
],
"input" : {
	"text" : "where can i find my insurance claims"
},
"intents" : [
	{
		"intent" : "claims_inquiry",
		"confidence" : 0.8324913501739502
	}
],
"locale" : "en-ca"
}

The good news is that we don't need grok to parse JSON -- grok is good at extracting complex patterns from strings, but it can get hairy pretty fast.

The bad news is that it looks like the "json" you're attempting to process isn't valid JSON -- it includes values like ObjectId("5a2b18500621b10042e4b3f9") and NumberInt("20") which aren't legal;


By default, the logstash-input-file uses the lines codec, which means that it assumes the input is one log event per line. Assuming you can get legal JSON, we can tell the file input to use the json codec, which will instead emit one event per json Object (and that event will have the same structure as our JSON blob -- the same keys and values it represents, which means you could cut out the other filters):

input {
  file {
    codec => json
    # ...
  }
}

That said, Logstash isn't optimised for processing single events, but rather for large streams of events. If you pointed the file input at a directory, it could pick up a bunch of files, or you could put many events in a single file using newline-delimited json with the json_lines codec.

I fixed the formatting of the json file, and i tried using the json codec as well as json lines. neither of them worked; instead when i used those then i got jsonparsefailure as well as grokparsefailure. when i continued using json_encode the jsonparsefailure was gone, then I tired to debug my code step by step, whenever i excluded the grok filter it didn't give me any error's however i need grok so i can get the format of "json_message.clientCode" etc. in kibana.

Here is what i changed my json file to:

{
"_id" : "5a2b18500623f9",
"clientCode" : "tk",
"conversationId" : "c01b73b6-7055817a661b",
"employeeId" : "3498",
"entities" : [
{
	"entity" : "benefits",
	"location" : [
	],
	"value" : "insurance",
	"confidence" : NumberInt("10")
}
],
"input" : {
"text" : "where can i find my insurance claims"
},
"intents" : [
{
	"intent" : "claims_inquiry",
	"confidence" : 0.8324913501739502
}
],
"locale" : "en-ca"
}

How didn't they not work? Does your input pass jsonlint?

I still don't think you need Grok :slight_smile:

After reading the JSON, we'll have a fully-contextualised event object that has individual properties based on the json representation we fed in; when we use the elasticsearch output, the clientCode field will get persisted on its own.

Given:

{"_id":"5a2b18500623f9","clientCode":"tk","conversationId":"c01b73b6-7055817a661b","employeeId":"3898","entities":[{"entity":"benefits","location":[290,209],"value":"insurance","confidence":10}],"input":{"text":"where can i find my insurance claims"},"intents":[{"intent":"claims_inquiry","confidence":0.8324913501739502}],"locale":"en-ca"}

-- cleaned-input.jsonl (file ends with newline character, indicating an end to the object)

We can use the stdout output with the rubydebug codec to output a representation of the parsed event; note how it has individual fields and metadata:

╭─{ yaauie@castrovel:~/src/elastic/discuss-scratch/123285-not-grok-but-json }
╰─○ ~/src/elastic/releases/logstash-6.2.2/bin/logstash -e 'input { file { path => "'"$(pwd)/cleaned-input.jsonl"'" codec => json_lines } } filter { } output { stdout { codec => rubydebug }}'
Sending Logstash's logs to /Users/yaauie/src/elastic/releases/logstash-6.2.2/logs which is now configured via log4j2.properties
[2018-03-13T21:44:29,058][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"fb_apache", :directory=>"/Users/yaauie/src/elastic/releases/logstash-6.2.2/modules/fb_apache/configuration"}
[2018-03-13T21:44:29,075][INFO ][logstash.modules.scaffold] Initializing module {:module_name=>"netflow", :directory=>"/Users/yaauie/src/elastic/releases/logstash-6.2.2/modules/netflow/configuration"}
[2018-03-13T21:44:29,273][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2018-03-13T21:44:29,845][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.2.2"}
[2018-03-13T21:44:30,260][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
[2018-03-13T21:44:33,071][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2018-03-13T21:44:33,478][INFO ][logstash.pipeline        ] Pipeline started succesfully {:pipeline_id=>"main", :thread=>"#<Thread:0xd5a613f run>"}
[2018-03-13T21:44:33,583][INFO ][logstash.agent           ] Pipelines running {:count=>1, :pipelines=>["main"]}
{
               "_id" => "5a2b18500623f9",
    "conversationId" => "c01b73b6-7055817a661b",
            "locale" => "en-ca",
              "host" => "castrovel.local",
             "input" => {
        "text" => "where can i find my insurance claims"
    },
        "clientCode" => "tk",
          "entities" => [
        [0] {
                 "value" => "insurance",
                "entity" => "benefits",
            "confidence" => 10,
              "location" => [
                [0] 290,
                [1] 209
            ]
        }
    ],
        "employeeId" => "3898",
           "intents" => [
        [0] {
            "confidence" => 0.8324913501739502,
                "intent" => "claims_inquiry"
        }
    ],
        "@timestamp" => 2018-03-13T21:44:33.942Z,
              "path" => "/Users/yaauie/src/elastic/discuss-scratch/123285-not-grok-but-json/cleaned-input.jsonl",
          "@version" => "1"
}
^C[2018-03-13T21:44:41,282][WARN ][logstash.runner          ] SIGINT received. Shutting down.
[2018-03-13T21:44:42,237][INFO ][logstash.pipeline        ] Pipeline has terminated {:pipeline_id=>"main", :thread=>"#<Thread:0xd5a613f run>"}
~/src/elastic/releases/logstash-6.2.2/bin/logstash -e   93.44s user 2.65s system 316% cpu 30.322 total
[success (30.000s)]

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.