Single Event Not Parsed Correctly


#1

Hello, I currently have FB 5.6.5 running and performing multiline log matching for json with something like the following in filebeat.yml:

multiline.pattern: '^{'
multiline.negate: true
multiline.match:  after

This matches fine on multiline events beginning with "{ and ending in "}", except for one single occurrence, when the log file gets updated every 5 minutes, and Filebeat reads in the entries. At this point, all but one event gets parsed correctly (I've looked at file, and it adheres to events being enclosed with the above curly braces), however, the single event that comes through, into ES and Kibana is clipped and does not contain the whole log message (this is evident that it is clipped from Filebeat, as the Logstash log complains when it is fed in). The log file I am using gets replaced every 5 minute by a python script that downloads logs from an API, copies the file to a tmp dir, then once it is done writing, copies it for Filebeat to read. Could there be some race condition b/w the file getting replaced and Filebeat still reading an event?

Any assistance/advice would be appreciated.

Thanks,
Cappy


(Steffen Siering) #2

Is the multiline events clipper or split? multiline.timeout is 5s by default. Some log-writers use buffered IO without flush timeouts. This does not cope well with the multiline timeout. Setting multiline.timeout:0, disables the timeout. The multiline.max_lines setting is 500 lines by default. If you event takes more lines, it will be clipped. The max_bytes setting is 10MB by default. If you raw json takes more bytes, it will be clipped again.

Filebeat keeps files open. On replace, the new file will have a different inode. While the old file is not available for other tools, it is kept open by filebeat for some more time. That is, upon replace filebeat will have the old harvester still active and start a new harvester for the new file. But given on timing and settings in filebeat, replacing can indeed race. But assuming default settings, I'd assume the is due to max_bytes or max_lines settings.


#3

Thanks, Steffen. The event gets clipped, I should say -- not split, as we
don't get the other part of the event. My events are not larger than 500
lines, so I don't believe that size is an issue. Do you think this could
also be assisted with increasing close_inactive or using close_eof with
a value of true?. Here is a sample of the filebeat log whenever the new
log write occurs:

2018-01-26T21:00:07Z INFO Harvester started for file: /var/log/mylog.log
2018-01-26T21:00:07Z ERR Failed to publish events (host:
localhost:5044:10200), caused by: EOF
2018-01-26T21:00:07Z INFO Error publishing events (retrying): EOF
2018-01-26T21:00:12Z INFO File is inactive: /var/log/mylog.log. Closing
because close_inactive of 5m0s reached.
2018-01-26T21:00:16Z INFO Non-zero metrics in the last 30s:
filebeat.harvester.closed=1 filebeat.harvester.started=1
filebeat.prospector.log.files.truncated=1
libbeat.logstash.call_count.PublishEvents=3
libbeat.logstash.publish.read_bytes=12
libbeat.logstash.publish.read_errors=1
libbeat.logstash.publish.write_bytes=17064
libbeat.logstash.published_and_acked_events=140
libbeat.logstash.published_but_not_acked_events=139
libbeat.publisher.published_events=140 publish.events=142
registrar.states.update=142 registrar.writes=2

As you can see it is truncating a single entry:

filebeat.prospector.log.files.truncated=1

Thanks,
Cappy


(Steffen Siering) #4

Can you share your complete prospector configs?

Please format logs, configs and terminal input/output using the </>-Button or markdown code fences. This forum uses Markdown to format posts. Without proper formatting, it can be very hard to read your posts.

close_eof: true closes the file when hitting EOF. If filebeat hits EOF while writing, it closes the file immediately. If the file gets deleted before filebeat gets a chance to re-open the file, you have data-loss, indeed. Does clipping mean no more 'to be expected' contents from same file?
Where do you create the temporary file? Is there a chance filebeat picking up the temporary file before the replace-operation?


#5

This is the entirety of it:

`filebeat.prospectors:`

# Each - is a prospector. Most options can be set at the prospector level, so
# you can use different prospectors for various configurations.
# Below are the prospector specific configurations.

- input_type: log

  # Paths that should be crawled and fetched. Glob based paths.
  paths:
    - /var/log/mylog.log
  multiline.pattern: '^{'
  multiline.negate: true
  multiline.match:  after

When the log file is written (script pulls info from API every 5 mins and writes it), first it is written to a tmp file (in /tmp). After the log file is finished being written, it is moved to /var/log/mylog.log (copied over existing file, replacing it).

The log event will get clipped, like:

pproved",
  "eventtype": "authentication",
  "location": {
      "city": "Shytown",
      "state": "GA",
      "country": "US"
  },
  "factor": "Duo Push",
  "device": "111-111-1111",
  "result": "SUCCESS"
}

When it should be like:

{
    "username": "udaman",
    "access_device": {
        "flash_version": "uninstalled",
        "java_version": "uninstalled",
        "os_version": "7",
        "browser_version": "63.0.3239.132",
        "os": "Windows",
        "browser": "Chrome"
    },
    "timestamp": 1517263774,
    "new_enrollment": false,
    "ip": "111.111.111.111.",
    "integration": "My Integration", 
    "host": "api.duosecurity.com",
    "reason": "User approved",
    "eventtype": "authentication",
    "location": {
        "city": "Shytown",
        "state": "GA",
        "country": "US"
    },
    "factor": "Duo Push",
    "device": "111-111-1111",
    "result": "SUCCESS"
}

Does this help to clarify?

Furthermore, the log file is written like this, one event after the other:

{
    "username": "udaman",
    "access_device": {
        "flash_version": "uninstalled",
        "java_version": "uninstalled",
        "os_version": "7",
        "browser_version": "63.0.3239.132",
        "os": "Windows",
        "browser": "Chrome"
    },
    "timestamp": 1517263774,
    "new_enrollment": false,
    "ip": "111.111.111.111.",
    "integration": "My Integration", 
    "host": "api.duosecurity.com",
    "reason": "User approved",
    "eventtype": "authentication",
    "location": {
        "city": "Shytown",
        "state": "GA",
        "country": "US"
    },
    "factor": "Duo Push",
    "device": "111-111-1111",
    "result": "SUCCESS"
}
{
    "username": "udaman",
    "access_device": {
        "flash_version": "uninstalled",
        "java_version": "uninstalled",
        "os_version": "7",
        "browser_version": "63.0.3239.132",
        "os": "Windows",
        "browser": "Chrome"
    },
    "timestamp": 1517263774,
    "new_enrollment": false,
    "ip": "111.111.111.111.",
    "integration": "My Integration", 
    "host": "api.duosecurity.com",
    "reason": "User approved",
    "eventtype": "authentication",
    "location": {
        "city": "Shytown",
        "state": "GA",
        "country": "US"
    },
    "factor": "Duo Push",
    "device": "111-111-1111",
    "result": "SUCCESS"
}

Thanks,
Cappy


(Steffen Siering) #6

Do you have tail_files enabled? This kind of "clipping" at the end of an event should not happen otherwise.

Also makes me wonder if inode is re-used. Files are identified by inode. Filebeat keeps a registry file, matching inode to last-read offset. If inode is the same, but new file is bigger, filebeat will 'continue' reading from last offset.

Can you check inodes between rotations? Adding -i to ls should print the inode.

Instead of replacing, consider using timestamps. Like: create a new file with current timestamp and have 'cleanup' deleting older files from older runs. This will give you some kind of log-rotation + ensure you always have a fresh inode. Using the rotation strategy, you can introduce a more custom retention-policy: e.g. download runs every 5mins, but drop json logs, older 1h. This way you can investigate issues, even if filebeat can not publish due to output not being available.


#7

I do not have tail_files enabled, unless it is enabled by default.

I will check the inode to see if it keeps getting re-used, and investigate a new strategy for populating/managing the logfile and get back to you with my results.

Thanks,
Cappy


#8

I just checked the inode, and it seems to be changing with each pull (tested manually), with the exception of a couple recent pulls, where it appears to keep going back and forth between two different inodes (2895/3098)

However, even if I explicitly delete the file, then run the script to replace the file, I still get the same old inode of 2895.

I'm wondering if clean_inactive could help here.

Thanks,
Cappy


(Steffen Siering) #9

The keep-going-back thing might indeed be a problem. The registry keeps the inode->offset mappings for a while. That is, there is a chance an inode being re-used (even a few minutes later), might still use an old offset. Reason for keeping the mapping is async processing/cleanup + file being potentially unavailable for users having some kind network based storage in place. As you always replace files, but do not append to a file anymore, try with close_eof and set clean_inactive as low as possible. With clean_inactive >= scan_frequency + ignore_older.


#10

Thanks Steffen, I'll try setting the following and see how it goes.

clean_inactive: 30s
close_eof: true

Thanks,
Cappy


#11

While these options seemed to help to a certain degree, ultimately, it appears that updating Filebeat to the latest version appeared to resolve the issue.

Thanks,
Cappy


(system) #12

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.