Structured Logging with Serilog & Elasticsearch – How to Avoid Mapping Explosion?

Hi,

I'm using Serilog with Elasticsearch in a .NET application, and I'm writing logs to Elasticsearch data streams using a basic setup like this:

Log.Logger = new LoggerConfiguration()
    .MinimumLevel.Information
    .WriteTo.Elasticsearch(new ElasticsearchSinkOptions(new Uri("http://localhost:9200"))
    {
        DataStream = new DataStreamName("logs", "app", "myservice"),
        IlmPolicy = "logs-app-ilm-policy",
        MinimumLevel = LogEventLevel.Information
    })
    .Enrich.FromLogContext()
    .CreateLogger();

I often enrich log entries with context properties like this:

using (LogContext.PushProperty("userId", userId))
{
    Log.Information("User with role {userRole} performed an action", userRole);
    Log.Information("User did an order for id: {OrderId}", orderId);
}

My Questions:
Since I’m using structured logging, all the properties like userRole, userId and orderId are automatically indexed.
But what if I don’t need to search on all of them, for example, userRole is only for display purposes in Kibana, is it a problem that it still gets indexed? I don't want to remove it from the structured logging, because I think it is still good that these fields are structured.

What are the best practices to avoid mapping explosion in Elasticsearch with structured logs?

  • Should I limit which fields get indexed? Because now automatically all the fields of the structured logging get indexed. Should I maybe use templates
  • Can I prevent indexing of certain properties in the code in .net?
  • Is it better to flatten data instead of using @?

Are there specific Elasticsearch settings or Serilog features you recommend for managing this better?

Thanks! Would appreciate advice on how to keep logs structured but avoid indexing overhead.

Can you share an example of what exactly the document you are sending to Elasticsearch looks like?

It is not clear how the document that elasticsearch receives looks like.

Hi @leandrojmp
Thanks for your reply.
I'm not manually sending documents to Elasticsearch, I'm using Serilog with the Elasticsearch sink in a .NET application, configured like this:

Log.Logger = new LoggerConfiguration()
    .MinimumLevel.Information
    .WriteTo.Elasticsearch(new ElasticsearchSinkOptions(new Uri("http://localhost:9200"))
    {
        DataStream = new DataStreamName("logs", "app", "myservice"),
        IlmPolicy = "logs-app-ilm-policy",
        MinimumLevel = LogEventLevel.Information
    })
    .Enrich.FromLogContext()
    .CreateLogger();

Here’s an example log statement I use in code:

using (LogContext.PushProperty("userId", userId))
{
    Log.Information("User with role {userRole} performed an action", userRole);
}

This ends up being indexed in Elasticsearch like this (simplified for privacy, with some values replaced):

{
  "_index": ".ds-logs-app-myservice-2025.05.15-000001",
  "_id": "random-id",
  "_version": 1,
  "_score": 0,
  "_source": {
     "agent": {
      "type": "Elastic.CommonSchema.Serilog",
      "version": "8.11.0"
    },
    "process": {
      "pid": 4567,
      "name": "MyApp",
      "title": "",
      "thread.id": 5
    },
    "metadata": {
      "userId": 1234
    },
   "log": {
      "logger": "Elastic.CommonSchema.Serilog"
    },
    "message": "User with role consumer performed an action", 
    "labels": {
      "MessageTemplate": "User with role {userRole} performed an action",
      "userRole": "consumer"
    },
     "@timestamp": "2025-05-15T13:33:51.573Z",
    ecs.version:"8.11.0"
    "service": {
      "name": "Elastic.CommonSchema.Serilog",
      "type": "dotnet",
      "version": "8.11.0"
    },
    "host": {
      "hostname": "USER-PC",
      "ip": ["0.0.0.0"],
      "os": {
        "platform": "Windows",
        "version": "0.0.0"
      },
    "log.level": "Information",
    "event": {
      "created": "2025-05-15T13:33:51.573Z",
      "severity": 2,
    },
    "user": {
      "id": "1234",
      "name": "user",
      "domain": "MYDOMAIN"
  },
  "fields": {
    "labels.userRole": ["consumer"],
    "metadata.userId": [1234],
    "log.level": ["Information"],
    "user.id": ["1234"],
    "user.name": ["user"],
    "host.hostname": ["USER-PC"],
    "host.os.platform": ["Windows"],
    "host.os.version": ["0.0.0"],
    "process.pid": [4567],
    "process.name": ["MyApp"],
    "process.title": [""],
    "process.thread.id": [5],
    "event.severity": [2],
    "event.created": ["2025-05-15T13:33:51.573Z"],
    "agent.type": ["Elastic.CommonSchema.Serilog"],
    "agent.version": ["8.11.0"],
    "service.name": ["Elastic.CommonSchema.Serilog"],
    "service.type": ["dotnet"],
    "service.version": ["8.11.0"],
    "@timestamp": ["2025-05-15T13:33:51.573Z"]
  }
}

So a single structured log line results in a fairly large document, with many fields being auto-populated by the Serilog ECS formatter (e.g., agent, process, event, host, etc.). Also the custom fields from the structured log line( e.g userRole) is also being indexed)

My questions:

  1. I want to keep using structured logging (e.g. {userRole}), so fields like labels.userRole are extracted and stored separately. But if I don't actually need to search by userRole, is there a cost to indexing it anyway? Can/should I prevent indexing but still keep the structure? Also what about all these autopopulated fields?
  2. I'm concerned about mapping explosion, since I have many types of logs with different properties. Is there a way to: Limit indexing to certain fields only? From how many fields should I be concerned indexing them in the structured logging formation
  3. Are there best practices or guidelines when using logging frameworks (like Serilog) with ECS and data streams to:
  • Avoid unnecessary mappings
  • Reduce storage usage
  • Still allow meaningful search and dashboards?

I want to keep logs structured and usable in Kibana, but avoid bloating the index or hitting limits on mapping fields.

Thanks again for your help!

Hi,
I’d really appreciate it if anyone could share some knowledge on this topic. Any help would be greatly appreciated!

Thanks in advance!

Hi @a_mandel,

those are very reasonable questions. I'm afraid I can't provide comprehensive, generically applicable guidelines. But here are some thoughts that might lead you into helpful directions:

There certainly is a cost to indexing, both in terms of storage and compute resources. Elasticsearch is constantly evolving to optimize both. Whether they represent a problem in your case depends on the volume of log data you're ingesting and the performance constraints of the environment you're deploying to. One way to approach this would be to evaluate questions like "What is the limiting factor for the deployment?" or "What are the possible queries/aggregations I need for my use-case?" and derive the appropriate trade-offs from there.

One way to handle fields like label, that can contain larger numbers of arbitrary key-value pairs without completely ignoring them, could be to use the flattened field type. That way it would remain a single label field in the mapping, but its content can be queried like keywords.

Elasticsearch has some safety mechanisms built in. For example, it has index settings like index.mapping.total_fields.limit, which defaults to 1000, beyond which Elasticsearch ignores additional fields. There are many others too. Generally I wouldn't be too concerned as long as you stay below these limits.

To achieve very selective mapping/indexing you could configure the index mapping to not be dynamic and only explicitly map the fields you're interested in.

You can also control whether a mapped field is indexed or just stored.

One good practice that comes to mind is to make sure data with different set of fields end are ingested into different indices or data streams. That way the mapping for each stays focussed on the fields used in each respective set of documents and is less likely to balloon because it is the union of all existing fields.

I hope that helps somewhat.

Hi @weltenwort,

Thank you very much for your response, this is really helpful.

I have some questions about it.

When using Elasticsearch together with Serilog, structured logging automatically creates, and indexes fields based on the log message. For example:

using (LogContext.PushProperty("userId", userId))
{
    Log.Information("User with role {userRole} performed an action", userRole);
}

In this case, userRole is automatically extracted and indexed, and Elasticsearch sets the type based on the value. the only set up I am doing is:

Log.Logger = new LoggerConfiguration()
    .MinimumLevel.Information
    .WriteTo.Elasticsearch(new ElasticsearchSinkOptions(new Uri("http://localhost:9200"))
    {
        DataStream = new DataStreamName("logs", "app", "myservice"),
        IlmPolicy = "logs-app-ilm-policy",
        MinimumLevel = LogEventLevel.Information
    })
    .Enrich.FromLogContext()
    .CreateLogger();

I’m wondering if there’s a way to control which fields get indexed when logging. and if it is possible to define their type (e.g. using flattened as you mentioned). Since it’s difficult to change the mapping after data is already in the index.

Is there a way to configure this behavior, either in Serilog or on the Elasticsearch side? I’m using data streams, so it would be ideal if this setup applies automatically when new backing indices are rolled over.

Also, regarding the other options you mentioned, such as limiting the number of mapping fields, how can I apply those settings when using data streams, especially since the backing indices are created automatically?

Thanks again!

Thanks for explaining why it might be hard to predict how Serilog adds new fields. I took a look at the code of the Elastic Serilog library and to me it looks like this:

  • When the name of a field in your log message matches a well-known ECS field, then it's added to the document as the appropriate field.
  • Otherwise, if it is a boolean or string type value, it is added as a property to the labels property.
  • Otherwise it is added to the metadata property.

Is that accurate? If so, it sounds like your override would probably be to map labels as a flattened field.

What is mapped and indexed how is controlled via the mappings of an index template (more on that later). The default mapping know about the ecs fields (mixed in via the ecs@mappings component template) and maps all "other" fields as keyword using dynamic mappings. You can override that by defining your own (dynamic) mappings. The guide section "Configure a data stream with an index template" tries to explain that. Within the composed_of list I'd recommend to add a logs-myservice@custom component template with those overrides.

When creating a new index (such as during roll-over), Elasticsearch tries to match an index template to the index name to determine the settings and mappings to apply. The default index template that ships with Elasticsearch for data streams following the official naming scheme is called logs:

As mentioned in the guide linked above, the canonical approach is to create your own index template matching the specific name of your data stream and to customize the ...@custom template.