Kv filter truncating value string at whitespace within quoted string


#1

This is a followup to this question: Kvpairs where one of the values is a JSON structure

I am using the following filter to interpret my logs:

filter {
  grok {
    match => {"message" => "msg=%{WORD:action} %{GREEDYDATA:kvpairs}" }
  }
  kv {
    source => "kvpairs"
    remove_field => ["kvpairs"]
  }
  json {
    source => "jsondata"
    remove_field => ["jsondata"]
  }
}

In my input message I have json data that looks like this:

jsondata={"parentId":"80b9b4e0-5552-45ca-a202-245f89e98635","id":"4d1dac94-dabb-4cf1-b8f0-b6c9cd348783","childIds":[],"priority":"LOWEST","inputData":{"dataId":"8083afdf-69c3-40f4-b528-daf39a7c7310","encodingName":"UTF-8","language":{"code":"por","name":"Portuguese"},"script":{"code":"latn","name":"Latin"}},"targetLanguage":{"language":{"code":"eng","name":"English"},"script":{"code":"latn","name":"Latin"}},"targetEngine":{"name":"Cybertrans","versionNumber":{"majorNumber":13,"minorNumber":11,"patchNumber":6}},"outputData":{"dataId":"b3e0cfe6-64e6-48fa-a84a-a8d56d098e69","encodingName":"UTF-8","language":{"code":"eng","name":"English"},"script":{"code":"latn","name":"Latin"}},"status":"COMPLETE","messages":["Error
 code from MT engine: NFW\u003d0.41735537190082644","System Selected 
Language Pair-NONE\u003dPortuguese\u003eEnglish","System Selected MT 
System\u003dMotrans","System Selected Dictionary\u003dgeneral","System 
Selected User Dictionary\u003dn/a","System Selected Source 
Encoding\u003dutf8","System Selected Text 
Corrector\u003dundetermined","The language identified was Spanish utf8, 
but it will be processed as Portuguese utf8.","Translation retrieved 
from cache."]}

When I look at that processed logs in kibana, I see that jsondata is truncated after the word Error:

{"parentId":"80b9b4e0-5552-45ca-a202-245f89e98635","id":"4d1dac94-dabb-4cf1-b8f0-b6c9cd348783","childIds":[],"priority":"LOWEST","inputData":{"dataId":"8083afdf-69c3-40f4-b528-daf39a7c7310","encodingName":"UTF-8","language":{"code":"por","name":"Portuguese"},"script":{"code":"latn","name":"Latin"}},"targetLanguage":{"language":{"code":"eng","name":"English"},"script":{"code":"latn","name":"Latin"}},"targetEngine":{"name":"Cybertrans","versionNumber":{"majorNumber":13,"minorNumber":11,"patchNumber":6}},"outputData":{"dataId":"b3e0cfe6-64e6-48fa-a84a-a8d56d098e69","encodingName":"UTF-8","language":{"code":"eng","name":"English"},"script":{"code":"latn","name":"Latin"}},"status":"COMPLETE","messages":["Error

This happens even if the length of the string before the word "Error" is changed, so I believe it is indeed the word Error that is somehow triggering the truncation, and not the length of the string. As noted below, after the word Error is where the first whitespace occurs in the string, so that is more likely to be the cause of the problem than the specific word.

Any idea why this is happening, and how I might be able to prevent it? Thanks in advance!


(Magnus B├Ąck) #2

When you list the contents of the jsondata field above there are a number of linebreaks, including one immediately after "Error". Does that accurately represent what's in the jsondata string, i.e. do you have newline characters there?


#3

No, that's just copy/paste artifacts -- the newline appears after the word Error because that's the first whitespace in the string -- so maybe that is what the issue is. In fact it now seems obvious that that must be the problems, so I've edited the subject header.


#4

I've figured out that the problem is that my field_split character is " " and that it's triggering even on the quoted space characters within the json string.

Is there a way to either have that only trigger on unquoted space characters? Or do I need to try to mutate my data in some way before doing the kv split?


#5

Sounds like this is a known bug:

Can anyone help with a clever work-around?


(system) #6