I'm normalizing hostnames and host name fields across multiple data sources and indices for security use cases.
I'm struggling to find a consistent use of the Hostname/domain fields in ECS. Consider the use case of aggregation: I want to count the number of security incidents for a host over multiple indices. I must use a field where the host is always represented in the same way - either always as hostname, or always as fqdn. For hostnames, host.hostname comes to mind. But for FQDN, there is no proper field - host.name may hold any representation of a host name. Even host.id is not supposed to be a unique identifier.
For the source fieldset (and by extension also destination and client/server) the semantics are wholly different:
- there is no field for neither the hostname, nor the fqdn alone. there is source.address, which can contain hostname, fqdn, and even IPs. This is great for searching, but terrible for aggregations
- source.domain may contain an fqdn, but this is only documented indirectly in source.address, where it states that x.address "should be duplicated to
.domain, depending on which one it is.". This is inconsistent compared to host.domain, which stores the network domain or active directory domain of a host.
- when containing a fqdn or a hostname, source.domain is a very confusingly named field.
I think that if we are to move towards a SIEM with UEBA capabilities, we must have very clear semantics and universal field names accross all fieldsets that to store hostname, fqdn, domain name, aliases and other names for things.