Logstash breaks my log line into multiple documets

I have nested JSON log files which I'm trying to parse and ship to my ES v.5
My logstash version is also 7.5.
The problem is among for instance 3000 logs I get 250 _jsonparsefailure!
My log format is nested JSON.
here is my logstash config file:

  file {
      path => "/home/kasra/test/*.log"
      codec => json { charset => "UTF-8"}
   }
}



filter {


   json {
      source => "message"
#      skip_on_invalid_json => "true"
  }


    geoip {
        source => "sourceIpAddress"
        target => "GeoLocation"
        fields => ["city_name", "country_name", "region_name", "location"]
    }
    date {
        match => ["eventTime", "ISO8601"]
        locale => en
        timezone => "Asia/Tehran"
        target => "@timestamp"
    }

}

output {
    elasticsearch {
        hosts => ['X.X.X.X:9200']
        index => ["ali-%{+YYYY.MM.dd}"]
        user =>  "XXXX"
        password => "XXXX"
    }
}

and here is my sample log which usually have it corrupted and broken by logstash into 2 or 3 logs..

[{"acsRegion":"test","additionalEventData":{"Scheme":"https"},"apiVersion":"2016-11-11","eventId":"CA6D8ACD-884D-461A-AFBE-35B81D43036C","eventName":"AttachPolicy","eventSource":"resourcemanager-share","eventTime":"2020-01-12T08:43:36Z","eventType":"ApiCall","eventVersion":"1","referencedResources":{"Policy":["test"]},"requestId":"CA6D8ACD-884D-461A-AFBE-35B81D43036C","requestParameters":{" charset":"UTF-8","PrincipalName":"test.com","AcceptLanguage":"en-US","SDSDSD":"asdasd","PolicyName":"testAccess","ResourceGroupId":556422749542138042,"RequestId":"CA6D8222ACD-884D-4261A-AFBE-35B81D43036C","HostId":"test2.com","PrincipalType":"IMSUser","PolicyType":"System"},"serviceName":"ResourceManager","sourceIpAddress":"y.x.z.z","userAgent":"Apache-HttpClient/4.5.7 (Java/1.8.0_152)","userIdentity":{"accountId":"556474922222542","principalId":"556433954213","sessionContext":{"attributes":{"creationDate":"2020-01-12T08:43:35Z","mfaAuthenticated":"false"}},"type":"test-account","userName":"test"}},{"acsRegion":"test","additionalEventData":{"Scheme":"https"},"apiVersion":"2016-11-11","eventId":"2BBED315-ABA2E-44E3EE-99335A-96sss265A69904C","eventName":"ListPolicyAttachments","eventSource":"resourcemanager-shsdsdsd.com","eventTime":"2020-01-12T08:43:40Z","eventType":"ApiCall","eventVersion":"1","requestId":"2BBED315-AB2E-44EE-995A-965904C","requestParameters":{" charset":"UTF-8","AcceptLanguage":"en-US","SDSDSDSVVV":"asd","RequestId":"2BBED3-ASSB2E-43334EE-995A-96265A69904C","HostId":"resourcemanager-sdsdsdds.com"},"serviceName":"ResourceManager","sourceIpAddress":"xx.xx.x.xx","userAgent":"Apache-HttpClient/4.5.7 (Java/1.8.0_152)","userIdentity":{"accountId":"5232323232323","principalId":"5353535353535353535","sessionContext":{"attributes":{"creationDate":"2020-01-12T08:43:39Z","mfaAuthenticated":"false"}},"type":"test-account","userName":"test"}},{"acsRegion":"test","additionalEventData":{"Scheme":"https"},"apiVersion":"2016-11-11","eventId":"BEsdsdsdsd-4B1C-BDED-50ssdsdSDSDSDECE5","eventName":"AttachPolicy","eventSource":"resourcemanager-shsdsdsdsd.com","eventTime":"2020-01-12T08:44:00Z","eventType":"ApiCall","eventVersion":"1","referencedResources":{"Policy":["AdministratorAccess"]},"requestId":"BE1DSDSD6079-49AE-4B1C-BDED-50SDSDSD1DE5","requestParameters":{" charset":"UTF-8","PrincipalName":"test@test.com","AcceptLanguage":"en-US","AFFSF":"asdasd","PolicyName":"AdministratorAccess","ResourceGroupId":556474954213823232323042,"RequestId":"BE1D6SDSDSD079-41213239AE-4B1sdsdSDSDC-BDED-501DA82EECE5","HostId":"resourcemanager-shsdsdsdsd.com","PrincipalType":"AASCUser","PolicyType":"System"},"serviceName":"ResourceManager","sourceIpAddress":"XX.XX.X.X","userAgent":"Apache-HttpClient/4.5.7 (Java/1.8.0_152)","userIdentity":{"accountId":"5562323232542138042","principalId":"556423232338042","sessionContext":{"attributes":{"creationDate":"2020-01-12T08:43:59Z","mfaAuthenticated":"false"}},"type":"test-account","userName":"test"}},{"acsRegion":"test","additionalEventData":{"Scheme":"http"},"apiVersion":"2017-04-30","eventId":"CEBFE232323E06-F7C4-4B64-8FBF-7A6D61F47780","eventName":"ListGroups","eventSource":"Aas-shasdasdasdasd.com","eventTime":"2020-01-12T08:44:10Z","eventType":"ApiCall","eventVersion":"1","requestId":"CEBFESDSDSD23E06-F7SD2C4-4SD23B64-8FSD23BF-7A6SD23D61F47780","requestParameters":{"MaxItems":1000," charset":"UTF-8","AcceptLanguage":"en-US","AAAP":"rsd","RequestId":"CEBFSDSDSDEE06-FSDSD7C4-4SDSDB64-8FSDSDBF-7A6D232323SDSD610","HostId":"ssds-shasdsdsd.com"},"serviceName":"Asd","sourceIpAddress":"X.XX.X.X","userAgent":"Apache-HttpClient/4.5.7 (Java/1.8.0_152)","userIdentity":{"accountId":"556232323232342","principalId":"556343434343434042","sessionContext":{"attributes":{"creationDate":"2020-01-12T08:44:10Z","mfaAuthenticated":"false"}},"type":"test-account","userName":"test"}},{"acsRegion":"test","additionalEventData":{"Scheme":"http"},"apiVersion":"2017-04-30","eventId":"1E01123235FB8-32323530-4E232321-B12323EB-523239C0DB7AEAA1","eventName":"GetUser","eventSource":"asd-shasdsdsdd.com","eventTime":"2020-01-12T08:44:48Z","eventType":"ApiCall","eventVersion":"1","referencedResources":{"User":["testsssdsdsdaax.com"]},"requestId":"1E02323232315FB8-353232323230-4E2321-B1EB-59C0ASDASD23DDVB7AEAA1","requestParameters":{" charset":"UTF-8","AcceptLanguage":"en-US","SDS":"asdsd","RequestId":"1E023232315FB8-35232330-42323E21-B12323EB-59C02323DB7AEAA1","HostId":"sdsd-shasdsdsdsd.com","UserPrincipalName":"testttt.com"},"serviceName":"asd","sourceIpAddress":"XX.XX.XX.XX","userAgent":"Apache-HttpClient/4.5.7 (Java/1.8.0_152)","userIdentity":{"accountId":"55647424443434","principalId":"556455555343434","sessionContext":{"attributes":{"creationDate":"2020-01-12T08:44:47Z","mfaAuthenticated":"false"}},"type":"test-account","userName":"test"}},{"acsRegion":"testtttt","additionalEventData":{"Scheme":"https"},"apiVersion":"2016-11-11","eventId":"4sdsdsdds57-633434-VF0E-AADA-80C4D5155F","eventName":"ListPolicyAttachments","eventSource":"resourcemanager-shasdasdasd.com","eventTime":"2020-01-12T08:44:50Z","eventType":"ApiCall","eventVersion":"1","requestId":"45023232323DE9B7-6402323234-4A0E-A2DA-82323232323155F","requestParameters":{" charset":"UTF-8","PrincipalName":"test@test.com","AcceptLanguage":"en-US","ASDDSX":"asd","RequestId":"45232323230DE9B7-6232323404-4A5550E-A2DA-80C4D995155F","HostId":"resourcemanager-sharsdasdasd.com","PrincipalType":"ISDSD"},"serviceName":"ResourceManager","sourceIpAddress":"XX.XX.XX.XX","userAgent":"Apache-HttpClient/4.5.7 (Java/1.8.0_152)","userIdentity":{"accountId":"4444444444444444555","principalId":"5555555554433","sessionContext":{"attributes":{"creationDate":"2020-01-12T08:44:49Z","mfaAuthenticated":"false"}},"type":"test-account","userName":"test"}}]

therefore I've got lots of ERROR in logstash which shows there are Json::ParserErrors
it tries to parse broken logs which are not in JSON format obviously!

If you have both a json codec and a json filter I would expect you to get an error for every event unless the JSON contains a message field that has quoted JSON nested inside it.

@Badger
I've tested both:
when I omit codec from input .. parsing get worse, from 300 sample logs nothing got parsed
when I omit json filter .. in 300 logs half of them got parsed and half with jsonparseerror

and FYI: I tested my logs with json validators .. they are valid json!

That JSON is an array so a json filter will require the target option to be set. A json codec will split the array into multiple events. Any time you get a _jsonparsefailure the filter will log an error indicating why the failure occured. Show us that message.

1 Like

Can you provide samples of parsefailure ?

Maybe the message is too big...

1 Like

here is the logstash error log

[2020-01-18T01:44:12,294][ERROR][logstash.codecs.json ][main] JSON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: Unexpected character (':' (code 58)): Expected space separating root-level values
at [Source: (String)"46:34Z","eventType":"ApiCall","eventVersion":"1","requestId":"12341234F-452F-4ED7-8537-AE876DF4CDF5","requestParameters":{"RecordId":1234123412341234,"RequestId":"12341234-1234-4ED7-8537-AE871234","HostId":"test.com"},"serviceName":"Alidns","sourceIpAddress":"X.X.X.X","userAgent":"xxm-v0.10.0","userIdentity":{"accessKeyId":"asdasdasd","accountId":"123123123123123","principalId":"123123123123123123123","sessionContext":{"attributes":{"creationDate":"2020-"[truncated 105 chars]; line: 1, column: 4]>, :data=>"46:34Z","eventType":"ApiCall","eventVersion":"1","requestId":"123123123123-123123-4ED7-8537-123123123","requestParameters":{"RecordId":123123123123123,"RequestId":"123123123123-123-123123-8537-AE123123123123","HostId":"test.com"},"serviceName":"zxc","sourceIpAddress":"x.x.x.x","userAgent":"asdasdasd-v0.10.0","userIdentity":{"accessKeyId":"asdasdasdasd","accountId":"123123123123123","principalId":"123123123123123123123","sessionContext":{"attributes":{"creationDate":"2020-01-11T15:46:34Z","mfaAuthenticated":"false"}},"type":"test-user","userName":"asdasd-adsasd-asd"}}]"}
[2020-01-18T01:44:12,304][ERROR][logstash.codecs.json ][main] JSON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: Unrecognized token 'csRegion': was expecting ('true', 'false' or 'null')
at [Source: (String)"csRegion":"cn-hangzhou","additionalEventData":{"Scheme":"http"},"apiVersion":"2015-01-09","eventId":"123123123-123123-123123-123123-123123123","eventName":"test","eventSource":"test.com","eventTime":"2020-01-07T12:51:07Z","eventType":"ApiCall","eventVersion":"1","requestId":"ASDASD-234234234-23E8-2323-123123123123","requestParameters":{"Lang":"en","PageNumber":1,"UserClientIp":"x.x.x.x","PageSize":100,"Starmark":true,"RegionId":"test","RequestId":"123123123-2323-123123-2323"[truncated 418 chars]; line: 1, column: 9]>, :data=>"csRegion":"cn-hasdsd","additionalEventData":{"Scheme":"http"},"apiVersion":"2015-01-09","eventId":"123123123-123-4234234-2323-asdaqas123123","eventName":"asdasdasd","eventSource":"test.com","eventTime":"2020-01-07T12:51:07Z","eventType":"ApiCall","eventVersion":"1","requestId":"asdasdasd-23232-2323-2323-23232323","requestParameters":{"Lang":"en","PageNumber":1,"UserClientIp":"x.x.x.x","PageSize":100,"Starmark":true,"RegionId":"test","RequestId":"asdasdasd-2323-asdasd-sdsd-3453453","ServiceCode":"test","userClientIp":"x.x.x.x","HostId":"test.com"},"serviceName":"test","sourceIpAddress":"x.x.x.x","userAgent":"Java/1.8.0_172","userIdentity":{"accountId":"234234234234234","principalId":"234234234234","sessionContext":{"attributes":{"creationDate":"2020-01-07T12:51:07Z","mfaAuthenticated":"false"}},"type":"test-user","userName":"test.test"}}]"}
[2020-01-18T01:44:27,423][ERROR][logstash.codecs.json ][main] JSON parse error, original data now in message field {:error=>#<LogStash::Json::ParserError: Unexpected end-of-input: was expecting closing quote for a string value
at [Source: (String)""}}]"; line: 1, column: 9]>, :data=>""}}]"}

another weird thing is when I use

  input{  file {
      path => "/home/kasra/test/*.log"
      codec => json { charset => "UTF-8"}
}
}
filter {


   json {
      source => "message"
      skip_on_invalid_json => "true"
  }


    geoip {
        source => "sourceIpAddress"
        target => "GeoLocation"
        fields => ["city_name", "country_name", "region_name", "location"]
    }
    date {
        match => ["eventTime", "ISO8601"]
        locale => en
        timezone => "Asia/Tehran"
        target => "@timestamp"
    }

}



output {
        stdout {codec => "rubydebug"}
}

I don't receive parsefailure in myconsole!

here is how kibana shows logstash broke the log into 3 logs

{
  "_index": "test-2020.01.18",
  "_type": "_doc",
  "_id": "axxlt28B9BJBUoK4T2oO",
  "_version": 1,
  "_score": null,
  "_source": {
    "tags": [
      "_jsonparsefailure",
      "_geoip_lookup_failure"
    ],
    "@timestamp": "2020-01-18T06:44:27.425Z",
    "message": "\"}}]",
    "host": "ElastAlert",
    "@version": "1",
    "path": "/home/kasra/test/test-log#Logs#Actiontrail#cn-test#2020#01#05#Actiontrail_cn-test_20200105135927_1002_1_484_f135ced8f760c8902299a777f0e3b99f.log"
  },
  "fields": {
    "@timestamp": [
      "2020-01-18T06:44:27.425Z"
    ]
  },
  "sort": [
    1579329867425
  ]
}


{
  "_index": "test-2020.01.18",
  "_type": "_doc",
  "_id": "-hxlt28B9BJBUoK4FDCX",
  "_version": 1,
  "_score": null,
  "_source": {
    "tags": [
      "_jsonparsefailure",
      "_geoip_lookup_failure"
    ],
    "@timestamp": "2020-01-18T06:44:12.307Z",
    "message": "csRegion\":\"test\",\"additionalEventData\":{\"Scheme\":\"http\"},\"apiVersion\":\"2015-01-09\",\"eventId\":\"123123-123123-2323-9D20-23232323237\",\"eventName\":\"DescribeDomains\",\"eventSource\":\"testtest.com\",\"eventTime\":\"2020-01-07T12:51:07Z\",\"eventType\":\"ApiCall\",\"eventVersion\":\"1\",\"requestId\":\"testt-2323-232323-232323-234234234\",\"requestParameters\":{\"Lang\":\"en\",\"PageNumber\":1,\"UserClientIp\":\"x.x.x.x\",\"PageSize\":100,\"Starmark\":true,\"RegionId\":\"test\",\"RequestId\":\"123123123-123123-123123-2323-45675767567\",\"ServiceCode\":\"test\",\"userClientIp\":\"x.x.x.x\",\"HostId\":\"test.test.com\"},\"serviceName\":\"test\",\"sourceIpAddress\":\"x.x.x.x\",\"userAgent\":\"Java/1.8.0_172\",\"userIdentity\":{\"accountId\":\"123123123123123\",\"principalId\":\"234234234234234\",\"sessionContext\":{\"attributes\":{\"creationDate\":\"2020-01-07T12:51:07Z\",\"mfaAuthenticated\":\"false\"}},\"type\":\"test-user\",\"userName\":\"test.test\"}}]",
    "host": "ElastAlert",
    "@version": "1",
    "path": "/home/kasra/test/test-log#testLogs#Actiontrail#cn-test#2020#01#07#Actiontrail_cn-test_20200107124840_1002_2_602_6634462db9e123f92f36b739883f41af.log"
  },
  "fields": {
    "@timestamp": [
      "2020-01-18T06:44:12.307Z"
    ]
  },
  "sort": [
    1579329852307
  ]
}

{
  "_index": "test-2020.01.18",
  "_type": "_doc",
  "_id": "_Rxlt28B9BJBUoK4FDDW",
  "_version": 1,
  "_score": null,
  "_source": {
    "tags": [
      "_jsonparsefailure",
      "_geoip_lookup_failure"
    ],
    "@timestamp": "2020-01-18T06:44:12.297Z",
    "message": "46:34Z\",\"eventType\":\"ApiCall\",\"eventVersion\":\"1\",\"requestId\":\"232323-232F-232323-1223-AE876DF4CDF5\",\"requestParameters\":{\"RecordId\":123123123123123123,\"RequestId\":\"123123123-asd-1231231-asdasd-123123123123\",\"HostId\":\"test.test.com\"},\"serviceName\":\"test\",\"sourceIpAddress\":\"x.x.x.x\",\"userAgent\":\"test-test-v0.10.0\",\"userIdentity\":{\"accessKeyId\":\"asdasdasdasd\",\"accountId\":\"123123123123123\",\"principalId\":\"123123123123123\",\"sessionContext\":{\"attributes\":{\"creationDate\":\"2020-01-11T15:46:34Z\",\"mfaAuthenticated\":\"false\"}},\"type\":\"test-user\",\"userName\":\"test-test-test-test\"}}]",
    "host": "ElastAlert",
    "@version": "1",
    "path": "/home/kasra/test/test-log#testLogs#Actiontrail#cn-test#2020#01#11#Actiontrail_cn-test_20200111154616_1002_2_600_2f36ed009701e9d60711ddedcf1e4d30.log"
  },
  "fields": {
    "@timestamp": [
      "2020-01-18T06:44:12.297Z"
    ]
  },
  "sort": [
    1579329852297
  ]
}

That suggests that the JSON is split across multiple lines. You may be able to use a multiline codec on the file input to ingest a complete JSON object as a single event.

but my log format doesn't contain multiles(multiple \n) it just has a new line at the end of each logs
I tested multiline codec, this doesn't work(no event get matched and shipped to my ES) also no error in logstash
another weird thing is it doesn't show anything in kibana until I reset the logstash service! suddenly bunch of logs get shipped and appear in kibana!! and it stop receiving or showing other logs until I restart the logstash service again!
btw I without considering this issue I see logs in kibana my problem remain the same as before
here it is:

input{  file {
      path => "/home/kasra/test/*.log"
      codec => multiline {
#          pattern => '\[{"acsRegion'
      pattern => '\[\{"acsRegion'
      negate => "false"
      what => "previous"
     }
}
}
filter {

   json { 
          source => "message"
          target => "message2" 
     }

and here is the problem as before logs get broken into multiple documents which corrupt my json log format.
{
"_index": "alicloud-2020.01.19",
"_type": "_doc",
"_id": "OKaKvG8B9BJBUoK4vyzY",
"_version": 1,
"_score": null,
"_source": {
"tags": [
"_jsonparsefailure",
"_geoip_lookup_failure"
],
"message": "faAuthenticated":"false"}},"type":"test-user","userName":"test.testtttttttt"}}]",
"path": "/home/kasra/test/test-log#Logs#Actiontrail#cn-hangzhou#2019#12#31#Actiontrail_cn-hangzhou_20191231190924_1002_1_494_41cf3979ee68a8e5efc194e365393280.log",
"host": "ElastAlert",
"@timestamp": "2020-01-19T06:43:26.864Z",
"@version": "1"
},
"fields": {
"@timestamp": [
"2020-01-19T06:43:26.864Z"
]
},
"sort": [
1579416206864
]
}

there should be a reason for this, at first I thought my logs are not in json format but I double-checked them with json validator sites .. they are valid json and the only \n in logs are at the endings not between its key-value paired format.
any idea?

@grumo35 sry for delay in reply
actually when I tested it using stdin and stdout as input and output without any filter
I found out it doesn't accept my whole log!
it omits some part from its end

when I send my log via stdin
I see it is not identical to message!

message only contain 4096 characters! the rest will be omited
stdin => my whole log
"message" => "(partial log)"

@Badger
actually it is not multiline

I found out some of the log are seem too big for logstash

I tested via stdit and stdout without any filter using debug mode
I see my stdin which is my intact log is not identical with "message" => ".. which it shows as output..
message only contain 4096 characters! the rest are omited

finally I found the root cause..
I've created another issue (topic)
i would appreciate your help in advance

by the way I was not sure If it was ok to create a new topic for it or keep on discussing on this topic.
If it is not ok, plz sb inform me, I can close this then.

That's what i was meaning by "maybe the message is too big" for the actual limit of 4k

You can change the limit but in elastic.yml

1 Like

yes, you were right, the message was too big so that logstash just read first 4096 characters...

how come elastic.yml would change this limitation? isn't it related to logstash?!
as it is hard-coded in logstash https://github.com/logstash-plugins/logstash-input-stdin/blob/v3.2.2/lib/logstash/inputs/stdin.rb#L17
I think there is no way to change it...
anyway, I used Grok for those logs which are too big (using if "_jsonparsefailure" in [tags]) at least I have important fields from it now

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.