Clarity on ECS 1.4 service field group

After digging into ECS 1.4 and the different sub field groupings. I'm still struggling to understand the difference between service.id and service.type.

For service.id the documentation states

Unique identifier of the running service. If the service is comprised of many nodes, the service.id should be the same for all nodes.

This id should uniquely identify the service. This makes it possible to correlate logs and metrics for one specific service, no matter which particular node emitted the event.

Note that if you need to see the events from one specific host of the service, you should filter on that host.name or host.id instead.
example: d37e5ebfe0ae6c4972dbe9f0174a1637bb8247f6

For service.type the documentation states

The type of the service data is collected from.

The type can be used to group and correlate logs and metrics from one service type.

Example: If logs or metrics are collected from Elasticsearch, service.type would be elasticsearch .

type: keyword

example: elasticsearch

Based on the example service.id, it seems to me that is more akin to an ephemeral id like a docker container id. But it states that if the service comprised of many nodes they should all match, which throws me off a bit.

The only situation I could see these two fields differing would be when you do a multi-node instance of a service, i.e. a sharded service/distributed service. Even then, if I have a webapi with a separate storage layer that is clustered and scalable. I would want every instance of my webapi to have a unique service.id for each node, because IMO they have no relation to each other than sharing a storage layer and a specific node could fail without blowing up the service cluster and I would just group on service.type + service.name to see my given webapi cluster's health.

Some clarity on this would be appreciated. It seems to me in most use-cases, these will be 1=1, and only when entering more advanced usecases of distributed computing models would they begin to differ. If this is the case, the documentation was difficult to understand that this was in-fact the specified case service.id comes into play.

Another thing I wanted to discuss was the idea of a service.state field.

This seems like a computed field that is the composition of a given service's collection of events. It seems bizarre to me to attempt to log the state of the service in a multi-threaded and multi-process environment where a cross-thread/process state store would be required to compare successes and failures just to log state in one log event. The performance hit alone on having to maintain a multiprocess aware date-store is a pain, not to mention the logic around determining the different "states" of a service in real time in the service based on recent requests and responses when you can literally just do that very thing very easily in ELK without impacting the complexity of my application's event logging.

It just seems extremely backwards to have this field be included in the service field group when I would most likely determine the health of my service based on metrics and the collection of events logged by it in ELK.

So as is, I would likely have 3 states composing of "starting", "running", and "stopping". Where 99% of events would be of state "running". Which in my opinion isn't of significant value since I already record lifecycle events, but I can see the argument of including them for more simplistic applications.

Hey @Gregory_Zimmers,

The easiest way to understand the "service" field set is from monitoring a service of some kind from the outside. Note that it's not about something internal like a Windows service that automatically starts on login, and performs internal tasks. Service here is in the sense of a service that this machine provides to other players around (e.g. service to other machines in the case of a database, or service to humans, in the case of a web service).

Perhaps we should align the service.id field's definition with the definition of service.type. Where service.type could be "elasticsearch" for an ES cluster, the service.id would be the cluster UUID (and service.name would be the cluster name, e.g. "logging-prod").

In cases where the processes making up a service are independent, such as web application servers, the usage of these is a bit looser. But if you would name a service "public-blog" when configuring APM on this web app, then APM would put this value in service.name for all of its events. Some people prefer to name their service with a domain name, that's also fine if the web app serves only one domain, e.g. service.name:www.example.com.

As for service.state, once again you may want to think about it more in terms of monitoring from the outside. If it's onerous to capture this from within the application for any reason, you may decide not to populate it from there. If however you're monitoring it from the outside with Elastic Uptime or an open source tool such as Sensu or Nagios, then when checking its state, service.state is where you would record its status at the time of a check.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.