Mapper-Size Plugin Breaks Parsing

Hello,
We want to enable mapper-size on our cloud clusters (we have done so on a single test cluster), and then we enabled _size so that we can get ingest metrics for the cluster, but it breaks parsing. As soon as our data streams began implementing _size the parses would break for those datastreams. I thought _size was a metadata field and should not impact any other parsing/ingest?

Can you show what is breaking? Some screenshot or log error?

This is a metadata if I'm not wrong, I would not expect to break anything.

It being metadata we also assumed it wouldn't break anything. Any log for the corresponding datastream fails to parse. The minute it's disabled, and we roll it over, it works fine. Maybe it's an issue with how we implemented? We did so like this:

That's probably the issue, you created a index template with a higher priority that will match logs-*, this will make the built-in template to not be used which will break all data streams.

You need to add the setting on the logs@custom component template, not create a new template to match the logs-* data stream.

Something like this I think:

PUT _component_template/logs@custom
{
  "template": {
    "mappings": {
      "_size": {
        "enabled": true
      }
    }
  }
}

That makes sense... I will give this a go. Thank you!

Is there a special way to apply this for fleet managed data streams, which is all of them in our case? It seems the logs@custom template is not used even though it is a component of the 'logs' index_template

The way is to use logs@custom, it didn't work?

Unless there was any issue, it should work when an index is created or rolled over.

There is no issue I can see. I set the component template logs@custom to include the _size mapping (as you recommended), and confirmed that 'logs@custom' is contained within the index_template 'logs' which is supposed to apply to all 'logs--'. When I rollover a datastream so that it starts adding _size and the logs@custom template starts getting applied, it simply doesn't. I checked one of the indexes for a data stream I rolled over to be sure (in this case one of .ds-logs-sentinel_one.group-default-2025.02.04-XXXXX) and it does not contain the _size field. When I mistakenly did this before with an index template, it did work. The only thing I can think is these are fleet managed. But that doesn't explain why it worked with an index template (but broke everything else obviously), but does not work when its a component template.

Yeah, I would expect it to work.

Since you are using Elastic Clou I would open a support ticket to understand what is happening.

So it turns out fleet datastreams don't leverage 'logs' they do, however, leverage 'logs@settings' so I was able to enable _size in there. The issue now, is unlike when I enabled it via an index template, I can't search the data. Where this used to work to look at the last hour of data for instance:

GET logs-*/_search
{
  "size": 0,
  "query": {
    "range": {
      "@timestamp": {
        "gte": "now-1h/h",
        "lt": "now/h"
      }
    }
  },
  "aggs": {
    "by_data_stream": {
      "terms": {
        "field": "data_stream.dataset.keyword",
        "size": 100
      },
      "aggs": {
        "sum_size": {
          "sum": {
            "script": {
              "source": """
                if (doc['_size'].size() == 0) {
                  return 0;
                } else {
                  return doc['_size'].value;
                }
              """
            }
          }
        }
      }
    }
  }
}

it now just outputs this:

{
  "took": 55,
  "timed_out": false,
  "_shards": {
    "total": 1490,
    "successful": 1490,
    "skipped": 1462,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 10000,
      "relation": "gte"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "by_data_stream": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": []
    }
  }
}

For whatever reason, when _size is enabled now through the component template instead of the index template, the 'field' value needs to be modified to data_stream.dataset; keyword must be dropped. Not sure why the formatting changes from component template to index template but there you have it.