Winlogbeat maximum size of event?


(Thomas Widhalm) #1

Hi!

I'm curious whether there is a maximum size of events winlogbeat can forward. While I need to know this for current versions I also need this information about version 1.2 or 1.3.

Background: I plan to use winlogbeat on new hosts where custom applications log into eventlog and they produce huge events.

On the other hand I have some older hosts where some events seem to be missing. More precisely they don't contain a "message" field, just an "message_error". I found a thread in this forum where this can be explained with winlogbeat running on a system which just forwards the events (some Windows mechanism is used to forward the events some hosts to the one running winlogbeat) but this is not the case in this setup.

Could you just tell me if there is a maximum size and maybe what this maximum is. This would help me with deciding whether I have to update to 5.x or search somewhere else for an error.

Thanks in advance and Cheers,
Thomas


(Andrew Kroh) #2

What is the content of the message_error?

Why don't you want to upgrade to Winlogbeat 5.x?

What operating system version and arch are we talking about?

I'll check to see if there is any artificial size limit in Winlogbeat. But I think any limit would be coming from Windows. There was a recent bug that was fixed that affected large messages. See the 5.1.2 changelog.


(Thomas Widhalm) #3

The content of message_error is "The substitution string for insert index (%1) could not be found." which seems to be a Windows error message. The original event is far bigger.

Updating from 1.3 to 5.x is definitely a thing to do but in larger companies major release upgrade often take lots of extra efforts (analyzing, testing, etc.) and replacing is not always possible using centralized software management. In short: It will take some time or even lots of time.

This is why I need to know if upgrading to 5.x would fix this issue and there should be put more effort into it or if it's just some sort of configuration issue.

I need to know this for all Windows server versions currently supported but I can ask which one is currently showing this particular error.

Thanks so far!


(Andrew Kroh) #4

In 5.x I don't see any limitations imposed by Winlogbeat on message size.

In 1.x, the buffer size used for receiving messages from Windows is fixed at 16KiB. If this is reached then the message_error should be "The data area passed to a system call is too small." (for ERROR_INSUFFICIENT_BUFFER).

These responses assume Windows Vista or newer.

I recommend checking the event in the Windows Event Viewer and looking at the XML content. Check the message and the parameters to see if they are correct. It would also be interesting to test 5.x on the same machine to see if it has the same problem.


(Thomas Widhalm) #5

Thanks a lot! I will forward this information to the admin in charge and I will let you know what the outcome is.

Cheers,
Thomas


(Thomas Widhalm) #6

Hi.

I have bad news. We changed the settings in winlogbeat to the following:

event_logs:
- name: Application
batch_read_size: 10
ignore_older: 5m
- name: Security
ignore_older: 5m

This does nothing to fix the issue. We see some error messages occuring regularly which seem to replace the actual message. To be more exact: The event has no messagefield, but only message_error with one of 3 different short strings which seem to be Windows error messages.

Is there anything else I could provide you with to point a finger on where the problem hides?

Cheers,
Thomas


(Andrew Kroh) #7

So what does Windows Event Viewer show for the same event? You could add add include_xml: true to aid in debugging on 5.x.

event_logs:
- name: Application
  include_xml: true

(Thomas Widhalm) #8

I got one event that hit this problem from the admin in charge. It shows a very, very long structured Window error message. Starting with Database redundancy health check failed. it is an Exchange error message, about 6,4KB when copied to a textfile.

Would there be a compatibility issue between Winlogbeat 5.2 and Logstash 1.5 or Logstash 2.x? We are sending events to Logstash to parse them before putting them into Elasticseach.

Thanks a lot for your help so far!


(Andrew Kroh) #9

I think it would be useful to compare the XML content from the Windows Event Viewer with the complete JSON event produced by Winlogbeat when include_xml: true is enabled for the event log in question.

You can also enable the file output in Winlogbeat if you want to rule out any issues with downstream processing.

So use something similar to this for the config:

event_logs:
- name: Application    # Or whatever event log is has the issue
  include_xml: true

output.file:
  path: 'c:/ProgramData/winlogbeat/output'
  filename: winlogbeat.json
output.logstash:
  hosts: ['xxxx:5044']

Then find the line with message_error in the winlogbeat.json file and compare it to the data in Event Viewer. In the event viewer when you find the matching event based on the record ID you can right-click on it and do Copy -> Copy Details as Text. This grabs the rendered text and the XML content which you can drop into a file. It would be great if you could share JSON event and the Event Viewer output.


(Thomas Widhalm) #10

Hi!

I still don't have the XML output, but I have one of the events directly from Eventlog and it's counterpart from a fileoutput of winlogbeat. I tried to anonymize hostnames and IP addresses.

{
    "@metadata": {
        "beat": "winlogbeat",
        "type": "wineventlog"
    },
    "@timestamp": "2017-02-09T11:36:04.000Z",
    "beat": {
        "hostname": "******",
        "name": "******"
    },
    "category": "Service",
    "computer_name": "*******.***.***",
    "count": 1,
    "event_id": 4113,
    "level": "Error",
    "log_name": "Application",
    "message_error": "The substitution string for insert index (%1) could not be found.",
    "record_number": "3880646",
    "source_name": "MSExchangeRepl",
    "type": "wineventlog"
}

and what it looks like in eventlog. For psychic sanity I refuse to indent all the following lines. :slight_smile:

Level	Date and Time	Source	Event ID	Task Category
Error	2/9/2017 12:36:04 PM	MSExchangeRepl	4113	Service	"Database redundancy health check failed.
Database copy: ******
Redundancy count: 1

Error: There were database redundancy check failures for database '*******' that may be lowering its redundancy and putting the database at risk of data loss. Redundancy Count: 1. Expected Redundancy Count: 2. Detailed error(s): 


        ******:
        Passive database copy '*******\*******' has an unhealthy status 'DisconnectedAndResynchronizing' for duration 7.02:15:40.7035932. [SuspendComment: None specified.] [ErrorMessage: The Microsoft Exchange Replication service was unable to perform an incremental reseed of database copy '******\******' due to a network error. The database copy status will be set to Disconnected. Error Microsoft.Exchange.Cluster.Replay.NetworkCommunicationException: An error occurred while communicating with server 'PECXMXP01B'. Error: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond [fe80::759a:96ef:9d8a:97d6%12]:64327 ---> System.Net.Sockets.SocketException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond [fe80::759a:96ef:9d8a:97d6%12]:64327
   at System.Net.Sockets.Socket.EndConnect(IAsyncResult asyncResult)
   at Microsoft.Exchange.Cluster.Replay.TcpClientChannel.TryOpenChannel(NetworkPath netPath, Int32 timeoutInMs, TcpClientChannel& channel, NetworkTransportException& networkEx)
   --- End of inner exception stack trace ---
   at Microsoft.Exchange.Cluster.Replay.NetworkManager.OpenConnection(NetworkPath& actualPath, Int32 timeoutInMsec, Boolean ignoreNodeDown)
   at Microsoft.Exchange.Cluster.Replay.LogCopyClient.OpenChannel(Boolean useScavenge)
   at Microsoft.Exchange.Cluster.Replay.LogCopyClient.QueryEndOfLog()
   at Microsoft.Exchange.Cluster.Replay.FailoverPerformanceTrackerBase`1.RunTimedOperation(TOpCode opCode, Action operation)
   at Microsoft.Exchange.Cluster.Replay.IncrementalReseeder.IsIncrementalReseedRequired(Action checkAbortRequested, Int64& highestLogGenCompared, Boolean& e00IsEndOfLogStream)
].
        

============================================================================
 Database Copy Health Transition Information (sorted by most recently Active)
============================================================================

PECXMXP01B.FUN.PEC:
LastCopyStatusTransitionTime: 01/01/0001 00:00:00
CopyFoundInAD:	CurrentState: Success;	LastActiveTransition: 2016-05-24T08:57:20.6185890Z;	LastInactiveTransition: 2016-05-24T07:52:19.6142746Z
CopyStatusActive:	CurrentState: Success;	LastActiveTransition: 2017-01-12T04:17:23.2315157Z;	LastInactiveTransition: 2017-01-12T03:09:28.2394162Z
CopyStatusHealthy:	CurrentState: Failure;	LastActiveTransition: 2017-01-12T03:16:22.7817665Z;	LastInactiveTransition: 2017-01-12T04:17:23.2315157Z
CopyStatusFailed:	CurrentState: Failure;	LastActiveTransition: 2016-12-20T10:10:23.4764938Z;	LastInactiveTransition: 2016-12-20T12:11:19.0701248Z

******.***.***:
LastCopyStatusTransitionTime: 02/02/2017 09:20:23
CopyFoundInAD:	CurrentState: Success;	LastActiveTransition: 2016-04-28T10:51:22.3217591Z;	LastInactiveTransition: 2016-04-28T09:16:20.5311349Z
CopyStatusActive:	CurrentState: Failure;	LastActiveTransition: 2017-01-12T03:09:28.2394162Z;	LastInactiveTransition: 2017-01-12T04:17:23.2315157Z
CopyStatusHealthy:	CurrentState: Failure;	LastActiveTransition: 2017-01-12T04:17:23.2315157Z;	LastInactiveTransition: 2017-01-13T03:16:53.3065243Z
CopyStatusFailed:	CurrentState: Failure;	LastActiveTransition: 2017-02-02T08:07:23.0529294Z;	LastInactiveTransition: 2017-02-02T09:21:23.8592385Z



================
Full Copy Status
================

----------------
Database Copy : ******\*******
----------------
WorkerProcessId              : 32692
ActiveDatabaseCopy           : ******
ActivationPreference         : 2
CopyStatus                   : DisconnectedAndResynchronizing
Viable                       : True
ActivationSuspended          : False
ErrorEventId                 : 2058
LastStatusTransitionTime     : 2/2/2017 9:20:23 AM
StatusRetrievedTime          : 2/9/2017 11:35:34 AM
InstanceStartTime            : 2/9/2017 11:34:48 AM
LowestLogPresent             : 57247
LastLogInspected             : 57306
LastLogReplayed              : 57260
LastLogCopied                : 57306
LastLogGenerated             : 60018
LastLogGeneratedTime         : 2/9/2017 11:35:34 AM
LastInspectedLogTime         : 1/13/2017 3:03:16 AM
LastReplayedLogTime          : 1/12/2017 8:13:58 PM
LastCopiedLogTime            : 1/13/2017 3:03:16 AM
LastLogInfoFromClusterTime   : 2/9/2017 11:35:11 AM
LastLogInfoFromCopierTime    : 1/1/0001 12:00:00 AM
LastLogInfoIsStale           : False
ActivationDisabledAndMoveNow : False
AutoActivationPolicy         : Unrestricted
HAComponentOffline           : False
NodeStatus                   : Up
ReplayLagEnabled             : Disabled
ReplayLagPlayDownReason      : None
LogReplaySuspendReason       : None
ReplayLagPercentage          : 0
----------------
Database Copy : ****\****
----------------
WorkerProcessId              : 11720
ActiveDatabaseCopy           : *****
ActivationPreference         : 1
CopyStatus                   : Mounted
Viable                       : False
ActivationSuspended          : False
ErrorEventId                 : 0

(Thomas Widhalm) #11

(I had to cut some lines - they look like the last few lines - because of restrictions in this forum.)


(Thomas Widhalm) #12

Hi!

Did you have time to have a look at the data I provided?

Cheers,
Thomas


(Andrew Kroh) #13

The output looks normal except for the error in Winlogbeat of course. Which Beat version was that from?

The XML output from Windows Event Viewer and Winlogbeat with include_xml will be the most helpful diagnosing the problem.

I was researching and found this which indicates that on the writing side of things that applications are limited to 64k bytes of data in an event. Writing is through the TDH (trace data helper) methods. So I wouldn't be surprised that 1.x fails on large messages, but I would expect the error to be "The data area passed to a system call is too small." rather than the one you are seeing.


(system) #14

This topic was automatically closed after 21 days. New replies are no longer allowed.


(Andrew Kroh) #15

(system) #16