Issues with @timestamp vs timestamp

Hey folks,

Currently we are using following ELK Stack as per given below for managing access logs from our web-servers.

Packages :

  1. elasticsearch-8.6.1-1
  2. logstash-8.6.1-1
  3. kibana-8.6.1-1
  4. filebeat-8.6.1-1

Index size for this stack is huge [size: 108Gi (326Gi) / docs: 214,699,296 (644,097,888)], and processing logs is not real time due to multiple factors like resource limitation network disruptions failures in various services on stack etc.

ok here are some of the challenges we are trying to address.

Unable to query of determine the exact time of an event from Kibana as by default documents are indexed based on field "@timestamp"
( to the best of my knowledge field @timestamp is injected by default at the time of document creation on Elasticsearch and cannot be removed, please correct me if I am wrong. )

where as we need the data based on the field "timestamp" how can we address this?

whenever there is any disruption in network or for some reason if the log processing is halted or delayed there is an issue with the reports which are dependent on the timestamp field.

We tried to make it work through queries but have not been able to get anything desired.

kindly guide us with appropriate direction on this.

Regards

I dont recognize that as the "by default" behavior. It's possible you are doing that, we'd need dig a bit to find out.

My presumption if you start with filebeat watching some web log files, feeding to logstash, which is injecting into Elasticsearch and you are looking at that data using kibana.

Can you share a couple of examples of log lines from one of those weblogs, and the corresponding document that as retrieved from elasticsearch? And the logstash configuration please.

Kibana does rely on this timestamp by default, so not having it may complicate using Kibana. It is however not Elasticsearch that adds it by default but rather Filebeat or Logstash.

If you want to use the timestamp field you may be able to do so in Kibana, but I suspectv the safer and easier way would be to copy this into @timestamp (e.g. using an ingest pipeline) so you can rely on all the default behaviours.

Maybe, but we don't yet know where @timestamp is coming from. It might be getting added by an ingest pipeline already for all we know.

And if @timestamp is maybe better thought of as (eg) added-by-logstash-timestanp rather than the actual-log-event-timestamp, it would be best to keep all those fields (possibly renaming) somehow. We dont yet know if every document in the index even has a timestamp field, if not, that would complicate matters a bit.

How are you sending the data into Elasticsearch?

You said that you have both Filebeat and Logstash, but how is this working? Does filebeat gets the log and send to logstash? Or does filebeat sends the log directly to Elasticsearch and you use Logstash to get other logs? You need to provide context.

But normally what you do is to parse your document in Logstash and use a field from the document to replace the @timestamp field, in Logstash this is done using a date filter.

Elasticsearch does not add any @timestamp field, but Logstash will add one when it receives the document.

Thanks for your valuable feedback, I will shortly share the configurations for my elk stack and possible example logs and documents after ensuring that I've censored necessary details about production.

@leandrojmp @Christian_Dahlqvist
(Kindly Note I've replaced some sensitive information from config and logs and replaced it with some random words.)

Thanks in advance.

Just to give you an idea, this is how logs are populated on ELK cluster.

On source we have Filebeat servers running which forward their logs to array of Logstash servers, from Logstash these logs are further forwarded to Elasticsearch.

Based on the your questions and comments I believe that the filed "@timestamp" gets added at Logstash Layer.

I am attempting to add the config files on this post hopefully I am allowed to do so considering the size if it doesn't work I will try to use classic pastebin to share the templates of my config files.

Given below is the content of filebeat.yml

filebeat.inputs:
- type: log
  paths:
    - /var/log/nginx/foo.access.log
  fields:
    log_type: foo-access
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
output.logstash:
  hosts: ["host01:5044", "host02:5044", "host03:5044"]
  loadbalance: true
  index: foo-access
processors:
  - add_host_metadata:
      when.not.contains.tags: forwarded
  - add_cloud_metadata: ~
  - add_docker_metadata: ~
  - add_kubernetes_metadata: ~

This is how my logstash.yml looks like :

path.data: /var/lib/ELK/logstash
pipeline.workers: 16
queue.type: persisted
queue.page_capacity: 1024mb
queue.max_bytes: 40gb
path.logs: /var/log/logstash

logstash-fileter.conf:

filter {
	if [fields][log_type] == "foo-access" {
		grok {
			patterns_dir => "/etc/logstash/patterns"
			match => { "message" => "%{FOO_GROK}" }
		}
	
		mutate {
			convert => [ "[response]", "integer"]
			convert => [ "[bytes_read]", "integer"]
			convert => [ "[request_duration]", "float"]
			convert => [ "[upstream_request_duration]", "float"]
	        }
		mutate {add_field => { "hostname" => "%{[host][hostname]}"}}
		mutate {add_field => { "src_log_file" => "%{[log][file][path]}"}}
		mutate {add_field => { "src_log_type" => "%{[fields][log_type]}"}}

		if ![api_param] {
			mutate {add_field => {"api_param" => " "}}
		}
		
		if ![request] {
		        mutate  {add_field => { "request" => "%{api}%{api_param}"}}
		}
	
		prune {
			whitelist_names => ["x_forwarder_for","x_forwarded_for","xforwardedfor","auth","user","timestamp","method","request","httpversion","response","bytes_read","request_header_referer","request_header_agent","clientip","request_duration","upstream_request_duration","message","hostname","host_ip","timestamp","Country","source","host.hostname","src_log_file","_type_new","src_log_type","api","api_param"]
		}
	}
}

This is how my GROK looks like

FOO_GROK (%{IPORHOST:clientip}|-)\s+(%{NGUSER:ident}|-)\s+(%{NGUSER:auth}|%{USERNAME:user}|-)\s+\[%{HTTPDATE:timestamp}\]\s+"(?:%{WORD:method})\s+((?<api>\/([^\ \?]*))(?<api_param>(\?.+?)|-|))(\s+HTTP/%{BASE10NUM:httpversion}|)"\s+%{NUMBER:response}\s+(%{NOTSPACE:bytes_read}|-)\s+"(%{DATA:request_header_referer}|-)"\s+"(%{DATA:request_header_agent}|-)"\s+"(?<x_forwarder_for>:?%{IP}(,\s+%{IP})*|-)"\s+(?:%{BASE10NUM:request_duration}|-)\s+(?:%{BASE10NUM:upstream_request_duration}|-)

Finally output of the document fetched from Kibana ( @RainTown )

{
  "_index": "foo-access-2025.03.07",
  "_id": "76t7g7gkjgi76tiuh8yoih",
  "_version": 1,
  "_score": 0,
  "_ignored": [
    "message.keyword"
  ],
  "_source": {
    "@timestamp": "2025-03-07T11:14:45.126Z",
    "request_header_agent": "-",
    "response": 200,
    "auth": "-",
    "httpversion": "1.1",
    "api_param": "?e=P&mid=9262007&ownerinfo=1&jdlite=&meowmeow=&wap=1&module=&version=1537&source=1&meow=1&app_version=&udid=&meowsc=007m1991prof",
    "hostname": "21-175-dc.foo.com",
    "api": "/foo_srv/jd_user.php",
    "upstream_request_duration": 0.057,
    "method": "GET",
    "src_log_type": "foo-access",
    "x_forwarder_for": "10.0.31.107",
    "clientip": "10.0.20.201",
    "timestamp": "07/Mar/2025:16:44:44 +0530",
    "message": "10.0.20.201 - - [07/Mar/2025:16:44:44 +0530] \"GET /foo_srv/jd_user.php?e=P&mid=9262007&ownerinfo=1&jdlite=&meowmeow=&wap=1&module=&version=1537&source=1&meow=1&app_version=&udid=&meowsc=007m1991prof HTTP/1.1\" 200 673 \"-\" \"-\" \"10.0.31.107\" 0.057 0.057 \"DSP : 10.0.20.201:9001\" .",
    "request_duration": 0.057,
    "request_header_referer": "-",
    "request": "/foo_srv/foo_user.php?e=P&mid=9262007&ownerinfo=1&jdlite=&meowmeow=&wap=1&module=&version=1537&source=1&meow=1&app_version=&udid=&meowsc=007m1991prof",
    "bytes_read": 673,
    "src_log_file": "/var/log/nginx/foo.access.log"
  },
  "fields": {
    "request_header_agent": [
      "-"
    ],
    "request": [
      "/foo_srv/jd_user.php?e=P&mid=9262007&ownerinfo=1&jdlite=&meowmeow=&wap=1&module=&version=1537&source=1&meow=1&app_version=&udid=&meowsc=007m1991prof"
    ],
    "auth": [
      "-"
    ],
    "src_log_file.keyword": [
      "/var/log/nginx/foo.access.log"
    ],
    "timestamp.keyword": [
      "07/Mar/2025:16:44:44 +0530"
    ],
    "api.keyword": [
      "/foo_srv/jd_user.php"
    ],
    "hostname": [
      "21-175-dc.foo.com"
    ],
    "request_duration": [
      0.057
    ],
    "httpversion.keyword": [
      "1.1"
    ],
    "clientip": [
      "10.0.20.201"
    ],
    "method.keyword": [
      "GET"
    ],
    "api": [
      "/foo_srv/jd_user.php"
    ],
    "request_header_referer": [
      "-"
    ],
    "timestamp": [
      "07/Mar/2025:16:44:44 +0530"
    ],
    "src_log_file": [
      "/var/log/nginx/foo.access.log"
    ],
    "method": [
      "GET"
    ],
    "x_forwarder_for.keyword": [
      "10.0.31.107"
    ],
    "src_log_type": [
      "foo-access"
    ],
    "request.keyword": [
      "/foo_srv/jd_user.php?e=P&mid=9262007&ownerinfo=1&jdlite=&meowmeow=&wap=1&module=&version=1537&source=1&meow=1&app_version=&udid=&meowsc=007m1991prof"
    ],
    "bytes_read": [
      673
    ],
    "message": [
      "10.0.20.201 - - [07/Mar/2025:16:44:44 +0530] \"GET /foo_srv/jd_user.php?e=P&mid=9262007&ownerinfo=1&jdlite=&meowmeow=&wap=1&module=&version=1537&source=1&meow=1&app_version=&udid=&meowsc=007m1991prof HTTP/1.1\" 200 673 \"-\" \"-\" \"10.0.31.107\" 0.057 0.057 \"DSP : 10.0.20.201:9001\" ."
    ],
    "request_header_agent.keyword": [
      "-"
    ],
    "@timestamp": [
      "2025-03-07T11:14:45.126Z"
    ],
    "api_param.keyword": [
      "?e=P&mid=9262007&ownerinfo=1&jdlite=&meowmeow=&wap=1&module=&version=1537&source=1&meow=1&app_version=&udid=&meowsc=007m1991prof"
    ],
    "response": [
      200
    ],
    "upstream_request_duration": [
      0.057
    ],
    "auth.keyword": [
      "-"
    ],
    "x_forwarder_for": [
      "10.0.31.107"
    ],
    "clientip.keyword": [
      "10.0.20.201"
    ],
    "src_log_type.keyword": [
      "foo-access"
    ],
    "request_header_referer.keyword": [
      "-"
    ],
    "httpversion": [
      "1.1"
    ],
    "hostname.keyword": [
      "21-175-dc.foo.com"
    ],
    "api_param": [
      "?e=P&mid=9262007&ownerinfo=1&jdlite=&meowmeow=&wap=1&module=&version=1537&source=1&meow=1&app_version=&udid=&meowsc=007m1991prof"
    ]
  },
  "ignored_field_values": {
    "message.keyword": [
      "10.0.20.201 - - [07/Mar/2025:16:44:44 +0530] \"GET /foo_srv/jd_user.php?e=P&mid=9262007&ownerinfo=1&jdlite=&meowmeow=&wap=1&module=&version=1537&source=1&meow=1&app_version=&udid=&meowsc=007m1991prof HTTP/1.1\" 200 673 \"-\" \"-\" \"10.0.31.107\" 0.057 0.057 \"DSP : 10.0.20.201:9001\" ."
    ]
  }
}

Once again Thank you, and let me know if you need any additional input from my end.

I will be trying to dig into Logstash documentation if I can find any thing related to this.

Regards

Have you seen this thread? Based on this I believe both Filebeat and Logstash will set the @timestamp field to the current time if this field does not already exist.

So FOO_GROK extracts various fields from message and puts part of it into the field called timestamp. Which is the actual nginx timestamp, of the HTTP GET/PUT/whatever.

And it looks like a timestamp, but it's not completely clear here if it's indexed as a timestamp field, might be indexed as text? You can check the mapping. Note the timezone (only in sense its not UTC):

"timestamp": "07/Mar/2025:16:44:44 +0530"

@timestamp is added elsewhere and is added in UTC.

"@timestamp": "2025-03-07T11:14:45.126Z"

... and is approx one second later in this case. Note this @timestamp also has sub-second precision, the original timestamp didn't, though that wasn't the point of your question.

So personally I'd prefer @timestamp to the time from the original nginx log entry, and what is now @timestamp to be some other field, like logstash_added_timestamp / filebeat_added_timestamp (you could thereafter also measure lags).

So maybe a one-character change in the grok, change timestamp to @timestamp might work for the first part, but you would need to the do other work to then get the other timestamp(s). Or as was suggested just copy timestamp to @timestamp in Logstash. And given the format differences it might not be as trivial as I wrote, AND you have existing data with a different setup to consider.

A change-very-little approach is to create a separate/duplicate data view using a different field for the time - currently you are likely using @timestamp - you can easily do some testing without changing much. But if you have lots of stuff (implicitly/explicitly) dependent on the existing data-view that is an easy test, but just kicks the actual work can down the road.

Good luck.

Well that is quite close explanation of scenario I am facing with my setup, I will be trying to play around with timestamp field and its data type, let's see what happens.

Once again thanks for all your Inputs.

Regards

@all Thank you for your inputs,
I think I have likely found a solution for this issue.

have been tweaking my log-stash configuration, finally came to given below.

Have added given below block under filter.
( below grok block from above logstash-filter.conf from above thread)

                date {
                        match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
                        target => "@timestamp"
                }

I still need to test a couple of things on this. like adding bulk logs and injecting duplicates etc .

Once again thank you all for your suggestions.

Good news!

One other thing to be careful about is if the destination index/data stream receives documents from any other flows, or even this flow without the original log having a timestamp present (for whatever reason). What you want to try to avoid is the @timestamp field in destination index possibly referring to different things.

Well this has solved just initial requirement of having timestamp from accesslogs to be the primary field (kind of index) for sorting and working with data on index.

Now we need to see how this will affect the other aspects of index management like daily rotation of index, earlier it use to happen on the basis of UTC, wondering how this will work now.

Also need to see how back-dated data will be handled in case data is older than 24 hrs ( when data belongs to previous day, or date is change on index.)

Kindly let me know if have any incites on above.

In the meanwhile I will try to settup an dummy index on production, not sure if the ticket will remain open over for that long.

Regards