Kv filter truncating value string at whitespace within quoted string

robynkoz · August 25, 2015, 5:28pm

This is a followup to this question: Kvpairs where one of the values is a JSON structure

I am using the following filter to interpret my logs:

filter {
  grok {
    match => {"message" => "msg=%{WORD:action} %{GREEDYDATA:kvpairs}" }
  }
  kv {
    source => "kvpairs"
    remove_field => ["kvpairs"]
  }
  json {
    source => "jsondata"
    remove_field => ["jsondata"]
  }
}

In my input message I have json data that looks like this:

jsondata={"parentId":"80b9b4e0-5552-45ca-a202-245f89e98635","id":"4d1dac94-dabb-4cf1-b8f0-b6c9cd348783","childIds":[],"priority":"LOWEST","inputData":{"dataId":"8083afdf-69c3-40f4-b528-daf39a7c7310","encodingName":"UTF-8","language":{"code":"por","name":"Portuguese"},"script":{"code":"latn","name":"Latin"}},"targetLanguage":{"language":{"code":"eng","name":"English"},"script":{"code":"latn","name":"Latin"}},"targetEngine":{"name":"Cybertrans","versionNumber":{"majorNumber":13,"minorNumber":11,"patchNumber":6}},"outputData":{"dataId":"b3e0cfe6-64e6-48fa-a84a-a8d56d098e69","encodingName":"UTF-8","language":{"code":"eng","name":"English"},"script":{"code":"latn","name":"Latin"}},"status":"COMPLETE","messages":["Error
 code from MT engine: NFW\u003d0.41735537190082644","System Selected 
Language Pair-NONE\u003dPortuguese\u003eEnglish","System Selected MT 
System\u003dMotrans","System Selected Dictionary\u003dgeneral","System 
Selected User Dictionary\u003dn/a","System Selected Source 
Encoding\u003dutf8","System Selected Text 
Corrector\u003dundetermined","The language identified was Spanish utf8, 
but it will be processed as Portuguese utf8.","Translation retrieved 
from cache."]}

When I look at that processed logs in kibana, I see that jsondata is truncated after the word Error:

{"parentId":"80b9b4e0-5552-45ca-a202-245f89e98635","id":"4d1dac94-dabb-4cf1-b8f0-b6c9cd348783","childIds":[],"priority":"LOWEST","inputData":{"dataId":"8083afdf-69c3-40f4-b528-daf39a7c7310","encodingName":"UTF-8","language":{"code":"por","name":"Portuguese"},"script":{"code":"latn","name":"Latin"}},"targetLanguage":{"language":{"code":"eng","name":"English"},"script":{"code":"latn","name":"Latin"}},"targetEngine":{"name":"Cybertrans","versionNumber":{"majorNumber":13,"minorNumber":11,"patchNumber":6}},"outputData":{"dataId":"b3e0cfe6-64e6-48fa-a84a-a8d56d098e69","encodingName":"UTF-8","language":{"code":"eng","name":"English"},"script":{"code":"latn","name":"Latin"}},"status":"COMPLETE","messages":["Error

This happens even if the length of the string before the word "Error" is changed, so I believe it is indeed the word Error that is somehow triggering the truncation, and not the length of the string. As noted below, after the word Error is where the first whitespace occurs in the string, so that is more likely to be the cause of the problem than the specific word.

Any idea why this is happening, and how I might be able to prevent it? Thanks in advance!

magnusbaeck · August 25, 2015, 5:34pm

When you list the contents of the jsondata field above there are a number of linebreaks, including one immediately after "Error". Does that accurately represent what's in the jsondata string, i.e. do you have newline characters there?

robynkoz · August 25, 2015, 5:37pm

No, that's just copy/paste artifacts -- the newline appears after the word Error because that's the first whitespace in the string -- so maybe that is what the issue is. In fact it now seems obvious that that must be the problems, so I've edited the subject header.

robynkoz · August 25, 2015, 6:29pm

I've figured out that the problem is that my field_split character is " " and that it's triggering even on the quoted space characters within the json string.

Is there a way to either have that only trigger on unquoted space characters? Or do I need to try to mutate my data in some way before doing the kv split?

robynkoz · August 25, 2015, 8:07pm

Sounds like this is a known bug:

Can anyone help with a clever work-around?

Topic		Replies	Views
Kvpairs where one of the values is a JSON structure Logstash	9	2573	July 6, 2017
Parsing a mix of String and JSON fields with the kv() filter Logstash	3	4189	August 8, 2017
Trimming of whitespace from kv filter fields unexpected Logstash	1	335	April 9, 2018
KV filter trim with regex Logstash	4	2064	March 4, 2017
KV filter on ugly json log Logstash	5	389	October 17, 2022

Kv filter truncating value string at whitespace within quoted string

Related topics