Metricbeat bug: UTC timestamp in @timestamp and local date in _index

Unlike Filebeat, Metricbeat uses UTC timestamp in @timestamp field and local date in _index. That causes the records to go into the wrong index with Elasticsearch output.

See _index and @timestamp fields below:

{
  "_index": "metricbeat-2017.06.21",
  "_type": "metricsets",
  "_id": "AVzNHb12ptjSSXq_Rno-",
  "_score": null,
  "_source": {
    "@timestamp": "2017-06-22T00:04:56.463Z",
    "beat": {
      "hostname": "host.example.com"
    },
    "metricset": {
      "name": "diskio"
    },
    "system": {
      "diskio": {
        "io": {
          "time": 610108590
        },
        "name": "sda3",
        "read": {
          "bytes": 16276817162240,
          "count": 196619690,
          "time": 2506927927
        },
        "serial_number": "***",
        "write": {
          "bytes": 371595212800,
          "count": 48685993,
          "time": 554740745
        }
      }
    },
    "type": "metricsets"
  }
}

Whereas Filebeat correctly uses the same timezone for the timestamp and the index (again, see _index and @timestamp fields below):

{
  "_index": "filebeat-20170622",
  "_type": "log",
  "_id": "AVzNHeLqptjSSXq_Rrcv",
  "_score": null,
  "_source": {
    "@timestamp": "2017-06-22T00:04:58.929Z",
    "offset": 530213,
    "beat": {
      "hostname": "host.example.com"
    },
    "source": "/log/2017/06/22/example.log",
    "fields": {},
    "message": "Test message."
  }
}

What versions of Filebeat and Metricbeat are you using? Please share the configs that you are using for both.

Is the data being written directly to Elasticsearch?

I use version 5.0.2 for ElasticSearch, Filebeat and Metricbeat.

The data is written directly into ElasticSearch.

I solved the problem by configuring the following ElasticSearch pipeline for Metricbeat, that extracts the date part from the original Metricbeat UTC timestamp:

{
  "description" : "A pipeline for Metricbeat.",
  "processors" : [
    {
      "date_index_name" : {
        "field" : "@timestamp",
        "date_formats": ["ISO8601"],
        "index_name_prefix" : "metricbeat-",
        "date_rounding" : "d",
        "index_name_format": "yyyyMMdd"
      }
    }
  ]
}

So, I guess it is ElasticSearch that uses the local date for index if no date_index_name processor is involved.

That does look like a problem in 5.0.2. Have you tried any newer versions like 5.4 or the snapshot builds for 6.0.

I have not tried later versions.

I spent some time grepping through beats code but could not find where it sets _index field. I did find where they set all other fields though, including @timestamp.

So, I suspect _index is set in ElasticSearch. Someone with ElasticSearch internals knowledge would be able to confirm that.

I looked into it and it seems to affect 5.x and 6.0. https://github.com/elastic/beats/issues/4569

We had addressed the issue is 1.x, but we made a change to allow custom index formats in 5.x and that's when the issue came back.

I think the fix should be to normalize all time.Time values to UTC at the point in the code.

case Time:
  return v.UTC(), nil
case []Time:
  // convert each one to UTC

Do I understand this correctly, that Metricbeat publishes @timestamp without the timezone designator, so that ElasticSearch treats it as a timestamp in its local timezone?

No, the @timezone value sent in the event is always in UTC (for all Beats).

The problem requires a bit a knowledge of how Go stores time values. A Go time.Time object holds both a time and a timezone. So when we run a date formatter on the time object to create the index name string (e.g. metricbeat-2017.06.28) it uses the timezone that's internal to the time object as part of the formatting. When you create a new time.Time object using time.Now() the timezone defaults to the host machine's timezone.

In order to fix the issue we need to convert the timezone of the time.Time object to UTC before passing it to the date formatter. This is done using time.UTC() method.

Some of the Beats may already do this when they construct the event, but there is no guarantee (probably Filebeat already does this). But we want to ensure that all time.Time objects contained in events are in UTC. All events published by a Beat pass through a normalization phase in libbeat to fix types, drop nulls, etc. This would be a good place to convert a time values to UTC. That code is here.

I was under impression that it is ElasticSearch that creates field _index, not Metricbeat.

Are you saying that it is Metricbeat that creates _index, and it does that before converting the local time into UTC?

The index is specified by the client making the [_bulk API] (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html) request to Elasticsearch.

Yes

2 Likes

Hi, @andrewkroh.

Can i change @timestamp from UTC to local times ?

thanks.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.