Timestamp seems lost when I stop all the 3 logstash nodes then start them?


(Keith Tt) #1

logstash version: 5.41

There are 3 logstash nodes in my ELK-STACK, I configure load-balancing in filebeat.

I use BELK to collect and analysis nginx access log, and I change the log-format to json.

Once, I stop all the 3 logstash nodes then start them after a few minutes, all the old data get together in one histogram...and the new data is normal.

How can I prevent this issue? Should I need to configure any parameters in the configuration file?


(Magnus Bäck) #2

It sounds like you're timestamping the events with the arrival/pickup time rather than parsing the timestamp from the message payload, but without additional details about your setup I can only speculate.


(Keith Tt) #3

Yes, it seems that I should use grok to collect the logs.

In my case, I changed the log-format of nginx access to json:

log_format json '{"@timestamp":"$time_iso8601",'
                  '"host":"$server_addr",'
                  '"clientip":"$remote_addr",'
                  '"remote_user":"$remote_user",'
                  '"request":"$request",'
                  '"http_user_agent":"$http_user_agent",'
                  '"size":$body_bytes_sent,'
                  '"responsetime":$request_time,'
                  '"upstreamtime":"$upstream_response_time",'
                  '"upstreamhost":"$upstream_addr",'
                  '"http_host":"$host",'
                  '"url":"$uri",'
                  '"domain":"$host",'
                  '"xff":"$http_x_forwarded_for",'
                  '"referer":"$http_referer",'
                  '"status":"$status"}';

And I do not use grok and filter in logstash.

So, in this situation, is there a method to save the timestamp in case all the logstash nodes are down...


(Magnus Bäck) #4

Hmm. So you're saying that Logstash doesn't use the timestamp found in the JSON payload's @timestamp field but picks up the rest of the JSON payload? That's a bit surprising. Is there anything in the Logstash log that indicates why this is happening? What you could do is have NGINX store the timestamp in a timestamp field and then use the date filter to parse that.

I can't recommend this way of making NGINX produce JSON logs. What if one of the string fields contains a double quote?


(Keith Tt) #5

It is different from grok pattern, there is no issue about double quote, here is one piece of data from my kibana:

{
  "_index": "zixun-nginx-access-2017.07.17",
  "_type": "zixun-nginx-access",
  "_id": "AV1P7cOGrY9BXe0OYOPM",
  "_version": 1,
  "_score": null,
  "_source": {
    "request": "GET /v1/top_news?uid=01a0351f072840c397f94ddc3960cd07 HTTP/1.0",
    "referer": "-",
    "offset": 92976439,
    "input_type": "log",
    "source": "/usr/local/nginx/logs/zixun.oupeng.com.access.log",
    "type": "zixun-nginx-access",
    "http_host": "zixun.oupeng.com",
    "url": "/v1/top_news",
    "http_user_agent": "-",
    "tags": [
      "beats_input_codec_json_applied"
    ],
    "remote_user": "-",
    "upstreamhost": "192.168.10.110:80",
    "@timestamp": "2017-07-17T09:42:47.918Z",
    "size": 623,
    "clientip": "183.165.108.89",
    "domain": "zixun.oupeng.com",
    "host": "117.119.33.239",
    "@version": "1",
    "beat": {
      "hostname": "uy05-12",
      "name": "uy05-12",
      "version": "5.5.0"
    },
    "responsetime": 0.006,
    "xff": "-",
    "upstreamtime": "0.006",
    "status": "200"
  },
  "fields": {
    "@timestamp": [
      1500284567918
    ]
  },
  "sort": [
    1500284567918
  ]
}

So you mean I still need to use filter plugin and date filter to deal with the logs?

But I have saved the time in log-format of nginx with corresponding field "@timestamp":"$time_iso8601", I think it should be read directly whenever, even if pipeline of logstash break for a while .


(Magnus Bäck) #6

So you mean I still need to use filter plugin and date filter to deal with the logs?

Maybe, yes. What does the original timestamp look like? Is there anything in the logs about why Logstash can't store that timestamp in the @timestamp field?

I think it should be read directly whenever

The @timestamp field is a bit special in that it isn't actually a string but a timestamp, so any value that you assign to that field must be parsed as a timestamp.


(Keith Tt) #7

here is one piece of my nginx access log:

{"@timestamp":"2017-07-19T01:46:13+08:00","host":"117.119.33.239","clientip":"218.104.251.146","remote_user":"-","request":"GET /v2/fetch_newest?uid=e4c245b92185443d83d996ae3cdaf644&top=1&category=tuijian HTTP/1.1","http_user_agent":"-","http_header":"-","size":1945,"responsetime":0.031,"upstreamtime":"0.031","upstreamhost":"192.168.10.111:80","http_host":"zixun.oupeng.com","url":"/v2/fetch_newest","domain":"zixun.oupeng.com","xff":"-","referer":"-","status":"200"}

(Magnus Bäck) #8

I can't reproduce with Logstash 5.4.1.

$ cat test2.config 
input { stdin { codec => json } }
output { stdout { codec => rubydebug } }
filter { }
$ cat data2
{"@timestamp":"2017-07-19T01:46:13+08:00","host":"117.119.33.239","clientip":"218.104.251.146","remote_user":"-","request":"GET /v2/fetch_newest?uid=e4c245b92185443d83d996ae3cdaf644&top=1&category=tuijian HTTP/1.1","http_user_agent":"-","http_header":"-","size":1945,"responsetime":0.031,"upstreamtime":"0.031","upstreamhost":"192.168.10.111:80","http_host":"zixun.oupeng.com","url":"/v2/fetch_newest","domain":"zixun.oupeng.com","xff":"-","referer":"-","status":"200"}
$ logstash -f test2.config < data2
Sending Logstash's logs to /home/magnus/logstash/logstash-5.4.1/logs which is now configured via log4j2.properties
[2017-07-18T20:22:13,056][INFO ][logstash.pipeline        ] Starting pipeline {"id"=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>5, "pipeline.max_inflight"=>1000}
[2017-07-18T20:22:13,084][INFO ][logstash.inputs.stdin    ] Automatically switching from json to json_lines codec {:plugin=>"stdin"}
[2017-07-18T20:22:13,103][INFO ][logstash.pipeline        ] Pipeline main started
[2017-07-18T20:22:13,152][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}
{
            "request" => "GET /v2/fetch_newest?uid=e4c245b92185443d83d996ae3cdaf644&top=1&category=tuijian HTTP/1.1",
            "referer" => "-",
          "http_host" => "zixun.oupeng.com",
                "url" => "/v2/fetch_newest",
    "http_user_agent" => "-",
        "remote_user" => "-",
       "upstreamhost" => "192.168.10.111:80",
         "@timestamp" => 2017-07-18T17:46:13.000Z,
               "size" => 1945,
           "clientip" => "218.104.251.146",
             "domain" => "zixun.oupeng.com",
               "host" => "117.119.33.239",
           "@version" => "1",
       "responsetime" => 0.031,
                "xff" => "-",
        "http_header" => "-",
       "upstreamtime" => "0.031",
             "status" => "200"
}
[2017-07-18T20:22:16,114][WARN ][logstash.agent           ] stopping pipeline {:id=>"main"}

(Keith Tt) #9

I am sorry, I do not follow you well.

What is the problem do you think about "can't reproduce"?


(Keith Tt) #10

I resolved this problem by setting the queue.type to persisted in /etc/logstash/logstash.yml.

I referred to this page:
https://www.elastic.co/guide/en/logstash/5.5/persistent-queues.html#configuring-persistent-queues


(Magnus Bäck) #11

What is the problem do you think about "can't reproduce"?

The filter that you're having problems with worked fine on my machine with Logstash 5.5.1 so it's probably not a change in Logstash that caused your problem. You need to look elsewhere.

I resolved this problem by setting the queue.type to persisted in /etc/logstash/logstash.yml.

I can't imagine that that had anything to do with solving your problem.


(Keith Tt) #12

I am sorry....Misunderstanding and coincidence...

This problem still exists if logstash is down for a long time.


(system) #13

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.