Filebeat is not able to parse json from files where \n separated json lines (events) are written. It sometimes misses the records and sometimes gives error when traffic is high

I am using BELK stack for Log Analytics.

versions:
ES-5.1.1 is on AWS service
Logstash-5.3.0, filebeat-5.3.0 and kibana-5.1.1

filebeat is running on the server where cdrs are generated.
Logstash is running on separate server and sends data to date based rolling index in elasticsearch.

We are using java logback.xml to write single line json events in .log file which rotates basis size and time logic

${CDR_NEW_LOG_PATH}//cdr.log ${CDR_NEW_LOG_PATH}//cdr-${IP_ADD}.%d{yyyy-MM-dd-HH-mm}.%i.log ${CDR_NEW_LOG_MAX_FILE_SIZE}


In filebeat.yml,

filebeat.prospectors:

  • input_type: log
    paths:
    • /root/cdrs/-.log
      json.keys_under_root: true

json.add_error_key: true

ignore_older: 24h
close_inactive: 12h
scan_frequency: 30s
clean_inactive: 48h
clean_removed: true
close_removed: true
close_eof: true

output.logstash:

The Logstash hosts

hosts: ["VALID IP:5043"]
fields_under_root: false


Logstash Config

input {
beats {
port => 5043
client_inactivity_timeout => 86400
}
}

filter
{
grok {
match => { "g2uEvent" => "%{TIMESTAMP_ISO8601:g2uEventTime}"}
}

date {
match => ["g2uEventTime", "ISO8601"]
target => "@timestamp"
}

}

output {

amazon_es {
hosts => ["VALID ELASTIC SEARCH END POINT"]
index => "d2c-%{+YYYY-MM-dd}"
}

stdout { codec => rubydebug }

}

Issues come randomly when there is traffic. If the same file is sent to filebeat again by, all data is correctly send to ES.

Common errors that I get in filebeat:
ERR Error decoding JSON: invalid character '}' looking for beginning of value

ERR Error decoding JSON: json: cannot unmarshal string into Go value of type map[string]interface {}

ERR Error decoding JSON: EOF

ERR Error decoding JSON: invalid character '\n' in string literal

ERR Error decoding JSON: invalid character 'm' in literal true (expecting 'r')

Sample CDR events
{"g2uEvent":{"g2uEventName":"CAMPAIGN_USER_LOGIN_MOBILE_DATA_EVENT","g2uEventTime":"2017-11-10T05:22:32.270Z","g2uEventDate":"2017-11-10"},"g2uCountry":{"g2uCountryIsoname":"IN","g2u_analytic_country_name":"IND"},"g2uOperatorInfo":{"g2uOperatorName":"Idea","g2uOperatorAnalyticsname":"IDEA"},"g2uLanguage":"en","g2uUser":{"g2uUserMsisdn":"91xxxxxxxx","g2uUserChannel":"mobileData","g2uUserStatus":"STATUS_EXPIRED"},"g2uGames":{"g2uGamesId":"","g2uGamesName":"","g2uGamesCategory":""},"g2uGamesVendor":{"g2uGamesVendorName":""},"g2uGamesUtm":{"g2uUtmSource":"test","g2uUtmMedium":"18333108","g2uUtmCampaign":"ADS"}}
{"g2uEvent":{"g2uEventName":"CAMPAIGN_USER_ADVERTISER_CALLBACK_BLOCKED_EVENT","g2uEventTime":"2017-11-09T14:21:45.918Z","g2uEventDate":"2017-11-09"},"g2uCountry":{"g2uCountryIsoname":"MY","g2uAnalyticCountryName":""},"g2uOperatorInfo":{"g2uOperatorName":"Maxis","g2uOperatorAnalyticsname":""},"g2uUser":{"g2uUserMsisdn":"60xxxxxxx","g2uUserChannel":"MobileData","g2uUserStatus":"STATUS_SUBSCRIBED"},"g2uGames":{"g2uGamesId":"","g2uGamesName":"","g2uGamesCategory":""},"g2uGamesVendor":{"g2uGamesVendorName":""},"g2uGamesUtm":{"g2uUtmSource":"test1","g2uUtmMedium":"18290630","g2uUtmCampaign":"ADS"}}

Please help me to resolve the same.

You only have problems when traffic is high? This could be an effect of back-pressure.

When/how fast do you rotate files. I see close_removed and close_eof set to true. That is filebeat might close not yet completely processed files, due to the file being removed by log rotation, but the outputs having to deal with back-pressure from Logstash or Elasticsearch service.

Hi Steffens,

Yes, when traffic is continuous and high I face the issue for few files having same json format data with different values.

I am rotating files using logback.
Example, java process writes cdr.log. logback.xml rotates the cdr.log to cdr-IP-yyyy-mm-dd-hh-mm.1.log after 1 minute of if size of cdr.log reaches 5 kb.

In filebeat prospectors, I pick up *-*.log files. So, no issue of rotation of incomplete files.

Hm... you rotate by time and file size. Makes me wonder if some json document is capped short when rotating by size. I guess in non-peak times the files always rotate by time and only during peak-times they are rotated by size?

Hi steffens,
I have not found even a single case of incomplete json in any of the
rotated files.
When i check a file with error, it has single liner complete json events in
file.
And if i try to copy the same file on the path on which filebeat
prospectors listen, file events are uploaded without failure.

I'm curious on how the log lines are written to the files. Could it be that you have partial lines? Is it a shared drive you are using? (based on the config I assume no)

Hi Ruffin,

Log lines are written by java using logback.

There is no shared drive.

Lines in log files are complete JSON and not partial at all.

We were not serializing the pojos developed for the event. After doing serialization, types of errors have reduced in filebeat in last 2 days.

Below are the kind of errors that I am getting now.

ERR Error decoding JSON: invalid character '}' looking for beginning of value

ERR Error decoding JSON: json: cannot unmarshal string into Go value of type map[string]interface {}

Also to my notice, the count of times an error occurs in logs has reduced drastically even during high and continuous traffic.

I didn't mean that the final lines are partial, but that perhaps a line is flushed in two parts. This could be the case for lager lines. For example logback first flushes {"test":"1" and then , "name":"ruflin"}\n and filebeat would already try to read the first part. One thing that contradicts this theory I just realised is that filebeat only will send events with \n to the json processor.

To double check: If you take the exact same file which caused and error and read it later again with filebeat, you don't get an error?

I don't know much about logback but could you share your full config? Perhaps there something pops out.

Hi Ruffin,

If I transfer the same file again, it is processed successfully without any error.

Filebeat picks already rotated files.

filebeat config :

In filebeat.yml,

filebeat.prospectors:

- input_type: log
paths:
/root/cdrs/*-*.log
json.keys_under_root: true
json.add_error_key: true

ignore_older: 24h
close_inactive: 12h
scan_frequency: 30s
clean_inactive: 48h
clean_removed: true
close_removed: true
close_eof: true

output.logstash:

hosts: ["VALID IP:5043"]
fields_under_root: false

////////////////////////////////////////////////////////////////////////
Logstash Config:

input {
beats {
port => 5043
client_inactivity_timeout => 86400
}
}

filter
{
grok {
match => { "g2uEvent" => "%{TIMESTAMP_ISO8601:g2uEventTime}"}
}

date {
match => ["g2uEventTime", "ISO8601"]
target => "@timestamp"
}

}

output {

amazon_es {
hosts => ["VALID ELASTIC SEARCH END POINT"]
index => "d2c-%{+YYYY-MM-dd}"
}

stdout { codec => rubydebug }
}

For the full config, I was thinking of the logback config :slight_smile: Sorry for the misunderstanding. What kind of disks and volumes are using (trying to find some more clues that could help us figure this one out).

Hi Ruffin,

Filebeat and CDR writing module (Java Spring) is deployed on AWS EC2 of type c3.large with EBS volume (200 GB)

Below is the logback.xml

Look for SIZE_AND_TIME_BASED_FILE and SIZE_AND_TIME_BASED_LOG appenders.

The values in $$ are filled in from log.properties below parameters:
CDR_LOG_PATH=/u2opia/cdrsgames2u
CDR_LOG_MAX_FILE_SIZE=5kb
CDR_NEW_LOG_PATH=/u2opia/cdrsgames2ud2c
CDR_NEW_LOG_MAX_FILE_SIZE=5KB
IP_ADD=10.141.102.60
These values are used in logback.xml

///logback.xml
<?xml version="1.0" encoding="UTF-8"?>

<configuration debug="true">

        <property file="${GAMES2U_ADS_CONFIG_PATH}/log.properties" />
<property file="${GAMES2U_ADS_CONFIG_PATH}/persistence.properties" />
        <!-- Send debug messages to System.out -->

        <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">

                <!-- By default, encoders are assigned the type ch.qos.logback.classic.encoder.PatternLayoutEncoder -->
                <encoder>
                        <pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level
                                %logger{36}.%M\(%line\) - %msg%n</pattern>
                </encoder>

        </appender>


        <!-- Send debug messages to a file at "c:/jcg.log" -->

        <appender name="FILE"
                class="ch.qos.logback.core.rolling.RollingFileAppender">

                <file>${LOG_PATH}//games2u_adwords.log</file>

                <encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">

                        <Pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level
                                %logger{36}.%M\(%line\) - %msg%n</Pattern>

                </encoder>

                <rollingPolicy class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">

                        <FileNamePattern>${LOG_PATH}games2u_adwords.%i.log.zip</FileNamePattern>

                        <MinIndex>1</MinIndex>

                        <MaxIndex>10</MaxIndex>

                </rollingPolicy>



                <triggeringPolicy
                        class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">

                        <MaxFileSize>${LOG_MAX_FILE_SIZE}</MaxFileSize>

                </triggeringPolicy>

        </appender>

         <appender name="SIZE_AND_TIME_BASED_FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
                <file>${CDR_LOG_PATH}//cdrgames2uAdwords.log</file>
                <!-- <file>D:\javaworkspace\games2u\cdrs\cdr.log</file> -->

        <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
                <fileNamePattern>${CDR_LOG_PATH}//cdrgames2uAdwords-${IP_ADD}.%d{yyyy-MM-dd-HH-mm}.%i.log</fileNamePattern>
                <timeBasedFileNamingAndTriggeringPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP">
                        <maxFileSize>${CDR_LOG_MAX_FILE_SIZE}</maxFileSize>
                </timeBasedFileNamingAndTriggeringPolicy>
        </rollingPolicy>

        <encoder>
                <!-- <pattern>%relative [%thread] %-5level %logger{35} - %msg%n</pattern> -->
                <pattern>%msg%n</pattern>
        </encoder>
        </appender>

         <appender name="SIZE_AND_TIME_BASED_LOG" class="ch.qos.logback.core.rolling.RollingFileAppender">
                <file>${CDR_NEW_LOG_PATH}//cdr.log</file>
        <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
                <fileNamePattern>${CDR_NEW_LOG_PATH}//cdr-${IP_ADD}.%d{yyyy-MM-dd-HH-mm}.%i.log</fileNamePattern>
                <timeBasedFileNamingAndTriggeringPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedFNATP">
                        <maxFileSize>${CDR_NEW_LOG_MAX_FILE_SIZE}</maxFileSize>
                </timeBasedFileNamingAndTriggeringPolicy>
        </rollingPolicy>
        <encoder>
                <pattern>%msg%n</pattern>
        </encoder>
        </appender>

        <logger name="sizeAndTimeBased" level="INFO">
                <appender-ref ref="SIZE_AND_TIME_BASED_FILE" />
        </logger>
                <logger name="sizeAndTimeBasedLog" level="INFO">
                <appender-ref ref="SIZE_AND_TIME_BASED_LOG" />
        </logger>

        <!-- Logger Filters for dependencies implemented -->
        <logger name="org.springframework" level="WARN"></logger>

        <logger name="com.ning" level="WARN"></logger>

        <logger name="org.hibernate" level="WARN"></logger>
        <logger name="org.hibernate.type" level="${org.hibernate.type.log.level}"></logger>
        <logger name="org.hibernate.SQL" level="${org.hibernate.SQL.log.level}"></logger>

        <logger name="com.jolbox" level="WARN"></logger>

        <logger name="commons-codec" level="WARN"></logger>

        <logger name="org.apache" level="WARN"></logger>


        <!-- By default, the level of the root level is set to DEBUG -->

        <root level="${LOG_LEVEL}">
                <appender-ref ref="STDOUT" />
                <appender-ref ref="FILE" />
        </root>

</configuration>

Two things that popped out at me:

  • It's a Windows machine (should have seen that earlier). Mainly good to know.
  • The file pattern is based on an IP address. Do multiple machines write to the same volume?
  • Is the tool that writes the log files and filebeat running on the same machine?

Hi Ruffin,

OS is Linux Centos 6.3

No, only one machine writes on the volume.

Yes, the portal java code running in the tomcat writes the events/cdrs on the same machine on which filebeat is running.

How come the path look like //cdr.log? That looks like a remote volume to me. Is that because of EBS?

Hi Ruffin,

We have changed the logic of moving files from size&time based to time based(rotation every minute interval)

Also, somehow another java code was running and writing the same file name(which filebeat is reading by running harvester) on the same path.

Issue is resolved.

Thanks Ruffin and Steffens for your support and pointers to look on.

Regards
Aman Sehgal

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.