Winlogbeat and ECS

Is there any current or completed effort to convert Windows event log fields to comply with ECS?

I've parsed the schema for security-audit logs and made an initial pass through them to convert appropriate fields to what I think would be the appropriate ECS objects (user, destination, source, network, device, etc). I should have enough done to create a reasonably reliable mapping and should have ~95% of the ECS conversion right (some odd/seldom triggered events, like IPSec-related events, may be wrong).

I would be happy to move that and potentially other Windows event provider schemas into a shareable format, but I don't know if I'm duplicating effort, reinventing the wheel, or moving in the wrong direction. Can anyone advise?

1 Like

Hi @thegrockq,

We are not dedicating an specific effort to migrate Winlogbeat event log fields to ECS, so I wouldn't say at the moment that this is going to be a duplicated effort. Indeed we'd be very interested in knowing more about what you have been doing, could you give more details? Feel free to open a PR, share a link with what you have done or just post here a more detailed description :slight_smile:

Thanks for this!

So far, what I've done is dump the schema for the Windows Security event log using this: (get-winevent -listprovider microsoft-windows-security-auditing).events. Then, I parsed the XML results into columns of field name, input data type and output data type (e.g. SubjectUserName, win:UnicodeString, xs:String), and did the equivalent of a "sort | uniq -c | sort" to determine which were the most common (and probably the most important to get right).

I used the output datatypes to make assumptions about how to map each field, and when encountering multiples of the same field, determine which should be multi-fields.

Then I went through each and identified field names that should map to current ECS top-level fields. For example, what Winlogbeat would currently call "event_data.ProcessName," "event_data.LogonProcessName," and "event_data.NewProcessName" should probably all renamed to process.name. Some of these are fully qualified paths and others are just the process name, so these should probably be both text and keyword fields.

I tried to reference example logs and reference the 'common names' of the nicely formatted non-XML version of the event log everyone is more familiar with when making an assumption about how and when to map fields to ECS.

Last, I don' t know if it's necessary, but I converted the Windows field names to all-lowercase, underscore-separated field names that are more consistent with how fields are named in ECS. So, TargetLinkedLogonId can be renamed to target_linked_logon_id. Again, may not be worth the overhead to convert these, but it's there.

Currently I have a spreadsheet with all 600+ fields mapped this way and the plan is to write a script to generate a mapping and a filter that will do the renaming. I assume there's a better way to do this, and if there's interested in doing something like this for ECS, I'd love to see Winlogbeat logs normalized and would be happy to redirect some of my efforts in a way that's more congruent with how Elastic is implementing ECS with other beats.

The end-goal is to map every field to its appropriate data type, internally normalize Windows event log fields among themselves, and normalize fields to their respective ECS fields as well. And then, hopefully, not have Elastic come up with something that replaces it in 12 months. :slight_smile:

Aside from what I've written above, I found some use for a few additional top-level fields when dealing with Windows events, whether to normalize Windows event log fields with themselves, or more generally. One was SID and the other was LDAP. I can also dump schemas for just about any other even provider and do something similar. Sysmon is probably next-up though.

Probably a bit more than you were asking for and I don't have anything ready for a PR yet. I probably need a bit of direction/guidance to get to that point (wrt what's helpful to others, is it better to just publish mappings and filters, or go the ECS route, etc).

2 Likes

(Just realized I didn't tag @jsoriano in the above post.)

This sounds like an awesome project. And I would like to explore how we can incorporate your work into Winlogbeat. Let me start with a few questions/comments to clarify my understanding.

So if the same field name was used in two events but was assigned different data types you would setup a multi-field so that it could be indexed both ways?

My initial thought is that I would not do this. Maybe if it's fully automated by the ingestion pipeline it would be OK, but I think it's probably not worth the extra effort. It changes the field names even more so from what they see in Windows, and it also increases the number of changes they have to deal with if they already have event data indexed by Winlogbeat. And if it's not automated then it creates a maintenance burden in that we must update the ingest pipelines to know about all fields in advance in order to rename them.

100% agree with the goal here. :+1:

We are working on integrating the ECS mappings into the index template that ships with all Beats. Then we can begin the conversion of some of our existing fields to ECS for the next major release of Beats - 7.0.

To do the processing we are using Elasticsearch Ingest Node. The pipeline can be one or more ingest node pipelines. So I think it would be really interesting to see if you can do the processing with Ingest Node or find out what limitations you hit so we can help address them.

I'd like to find a way to incorporate this into Winlogbeat such that when someone enables the Security log they get the normalization and conversion to ECS. This needs some more discussion about the mechanics of it in Winlogbeat (like adding modules similar to Filebeat).

I think that if you built a pipeline that worked in a setup similar to this then it would be easy to add Winlogbeat no matter how we decide integrate it.

winlogbeat.event_logs:
- name: Security

output.elasticsearch:
  pipelines:
    - pipeline: winlogbeat-security
      when.equals:
        log_name: "Security"
PUT _ingest/pipeline/winlogbeat-security
{
  "description" : "Windows Security Event Log normalization and ECS conversion",
  "processors" : [
    {
      "set" : {
        "field": "foo",
        "value": "bar"
      }
    }
  ]
}

Thanks for the awesome response @andrewkroh. This is helpful.

Exactly.

Good points - I completely agree. To further your point, I've only dumped the schema from a Win10 Enterprise box, so it's quite possible I'm missing fields that may exist in other versions.

Do you still think it's okay to attempt to explicitly map all known fields (those from MS schemas) according to the data types claimed/reported by Windows schema? This can be automated fairly simply and should be pretty easy to do for nearly every event provider (not just security logs). But, this might exceed 1000 fields. Are mappings that size problematic?

I'd like to pursue moving in the direction the pipeline suggestion you've made.

My only question/concern is about the use of an Elastic Ingest node as an output, instead of Kafka (or similar) or LS. We currently do some enrichment in LS that we couldn't do in ES, specifically in Winlogbeat, but in other Beats as well, and are planning a move to a Kafka output model in the near future (currently output to LS).

Can we define a pipeline in either the LS or Kafka outputs? If we do some processing in LS, then normalize in ES, it seems like it might be a bit clunky. Users would have to use field names in LS that don't match what you index or see in Kibana. (Or am I missing something? If so, please forgive my newb-ness.) This seems like it could be confusing.

That's definitely a valid concern. I'm going to discuss the concept of modules for Winlogbeat with some of the devs soon. It would be good to know exactly what things you do in LS that you cannot do with ES so we can account for them?

It is possible to use an Elasticsearch Ingest Node pipeline with Logstash. You have to add some additional configuration to the elasticsearch output in order to route the pipeline metadata all the way to ES.

Sorry for the delayed response. I've been traveling and in was in training all last week.

Of the top of my head, we do the following:

  1. Enrich Windows event logs with AD data. For example, if a username is in the logs, we might add group memberships for that user. We do this by periodically querying AD for all objects and indexing them in a second index.

  2. Another thing we do is keep a flat-file dictionary where we perform look-ups and change field names. For example, we have a handful of hosts that either don't have a PTR record, or ELK can't hit the DNS server that hosts the PTR record. So we do those 'lookups' manually.

  3. We've been interested in using the JDBC input for Logstash as well, but haven't implemented it yet. (Don't know if this is possible with an ingest node, but we see a lot of use for this.)

  4. We like to configure LS to output to the console for troubleshooting. (I'm not sure if this is possible with ingest nodes or not either.)

As a whole, and this may be outside the scope of what you asked, but we haven't given much thought to moving input/parsing/output to ingest nodes, since it seems easier and faster to create Logstash configs than to deal with both the learning curve of the API syntax and the pitfalls of generating valid JSON by hand. Is there a benefit to moving from LS to ingest nodes?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.