Tomcat logs are seeing appearing in order in discover section in kibana

Hi Team,

I have tomcat application running on two servers and sending logs to logstash through filebeat, when checking logs in discover section of kibana, I am not seeing exact sequence of logs in kibana as in server.

filebeat config -

name: app_server1
 
filebeat.inputs:
    - type: log
      fields_under_root: true
      fields:
         log_type:  app_server1
         app_id: app
      multiline.pattern: ^[[:space:]]+(at|\.{3})\b|^Caused by:|^java|^...|^-
      multiline.negate: true
      multiline.match: after
      paths:
        - /opt/apache-tomcat/logs/catalina.out
 
processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~
 
setup.dashboards.enabled: true
setup.kibana:
  host: "http://kibanaserver1:5601"
  username: elastic
  password: ${es_pwd}
 
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: true
 
output.logstash:
  hosts: ['logstashserver1:5044', 'logstashserver2:5044']
  loadbalance: true

logstash config -

input {
  beats {
    port => 5044
  }
}
 
filter {
if [log_type] == "app_server1" and [app_id] == "app"
  {
    mutate { gsub => ["message","\|"," "] } grok { patterns_dir => ["/etc/logstash/patterns"] match => { "message" => "%{MY_DATE_PATTERN:timestamp}%{SPACE}%{LOGLEVEL:level}%{SPACE}%{UUID:ConsentID}%{SPACE}%{WORD:TraceID}%{SPACE}%{WORD:TransactionID}%{SPACE}%{GREEDYDATA:messagetext}" } }
    mutate {
             replace => {
               "[type]" => "app_server1"
             }
           }
  }
if [log_type] == "app_server2" and [app_id] == "app"
  {
    mutate { gsub => ["message","\|"," "] } grok { patterns_dir => ["/etc/logstash/patterns"] match => { "message" => "%{MY_DATE_PATTERN:timestamp}%{SPACE}%{LOGLEVEL:level}%{SPACE}%{UUID:ConsentID}%{SPACE}%{WORD:TraceID}%{SPACE}%{WORD:TransactionID}%{SPACE}%{GREEDYDATA:messagetext}" } }
    mutate {
             replace => {
               "[type]" => "app_server2"
             }
           }
  }
}
output {
  if [log_type] == "app_server1" {
  elasticsearch {
    hosts => ['http://es1:9200', 'http://es2:9200', 'http://es3:9200']
    index => "app"
    template_name => "app"
    template_overwrite => "false"
        user => elastic
    password => "${es_pwd}"
      }
}
  if [log_type] == "app_server2" {
  elasticsearch {
    hosts => ['http://es1:9200', 'http://es2:9200', 'http://es3:9200']
    index => "app"
    template_name => "app"
    template_overwrite => "false"
        user => elastic
    password => "${es_pwd}"
      }
}
   elasticsearch {
    hosts => ['http://es1:9200', 'http://es2:9200', 'http://es3:9200']
    index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM}"
        user => elastic
    password => "${es_pwd}"
      }
}

From server, i can see below last logs on App1 server,

On kibana, the last line is different and not same as above server logs. ( I have filtered with tomcat logs and selected App1 server only)

If i try to search last line of server log (i.e consentStatus: accept) in kibana (did ctrl+F to search on page), i found it somewhere in middle and not at the first line (i.e as latest)

Similarly, if i try to search another log line (i.e subject: 1000000....) which is last 4th line in server log but at kibana it is not last 4th line.

Sorry for all the screen shots and highlighted mark.

Why it's not appearing in sequence and how can we fix this?

Thanks,

Hi All,

can someone please reply on why logs are not appearing in sequence in kibana.

Thanks,

From the screen shot, it's clear that there is issue multiline code (few DB related statements are seen in individual lines).

But i am not sure why logs are not appearing in sequence in kibana?

Thanks,

or here, If you see,

The last two lines in server are,

..... ldap client connected
..... ldap credentials bound

but in kibana, they have appeared exact opposite, (assuming last line of server log will appear first in kibana i.e very latest, so with this, ldap credentials bound line should be the latest in kibana)

.... ldap client connected 
.... ldap credentials bound
.
.
.

This is normal behaviour and there is no solution?

Thanks,

Your multiline pattern is wrong, you will need to fix it, but this has no relation with the ordering of the events.

About the ordering, you are sending your logs from filebeat to logstash and since logstash process the messages using multiple cores, the order is not guaranteed.

You can force logstash to use just one core changing the pipeline.workers to 1 for this specific pipeline, you change that in the pipelines.yml file. If you are running logstash using the command line you can pass the -w 1 parameter.

This will make logstash use only one processor, but depending on the rate of events this can impact the ingestion performance.

Hi @leandrojmp,

Thanks for your reply.

I was looking into documentation to know more about what you said and found below.

doc says,

When maintaining event order is important, use a single worker and set *pipeline.ordered ⇒ true*

So do i also need to set pipeline.ordered ⇒ true in logstash.yml along with pipeline.workers to 1 in pipelines.yml?

Thanks,

I don't think you need to change pipeline.ordered as the default value is auto and this enables ordering when you use pipeline.workers set to 1.

Can someone change tag from kibana to logstash of this discussion?

Thanks @leandrojmp,

Is there no other option to make sure of ordering without doing this change? As this option can impact the performance when there is huge ingestion for logstash to process.

Thanks,

Unfortunately no, the only way to try to make the logs ordered is using only one worker.

Hi @leandrojmp ,

Thanks for reply.

Can you please check last comment from below. (not sure on how to make changes on above config to do this changes). Do you see it might solve the problem?

Thanks,

The timestamp of your documents is the timestamp generated by logstash when the events enters the pipeline, you are not using the timestamp from your logs.

Even if you use the timestamp from your logs it would not solve the ordering issue because it does not have enough precision for it, you have miliseconds precision a multiple events happening on the same time, so this would have the same time in logstash.

Can someone please change tag from kibana to logstash.

@leandrojmp

I have added pipeline.workers: 1 in /etc/logstash/pipelines.yml on both the logstash servers and restarted logstash.

- pipeline.id: main
  path.config: "/etc/logstash/conf.d/*.conf"
 
  pipeline.workers: 1

Restarted application service but while checking today's logs in kibana discover, i still see them not appearing in order.

App1 (total 5 log entires in application.log on server)

Oct 20 10:38:26 <hostname> portal-api: {"level":"info","time":1634715506825,"pid":7107,"hostname":"<hostname>","config":{"appName":"portal-api","appHost":"https://devapi.com","port":"88","tenantName":"ob","psd2ApiSpec":"next","bankingApi":"https://api.com","consentsAuth":"https://login.","apidocsWebsite":"https://dev.com","tenantFintechName":"onefintech","datafeedServerCallback":"https://orufin/callback","pispWebsiteCallback":"https://pisp.orufin/buy","enableSwaggerPreview":true,"pdUri":"ldap://pd.:13","pdBindDn":"cn=directory","pdBindPassword":"***","pdBaseDn":"dc=bank,dc=com","pdBankTppDn":"TP","pdBankOauthClientsDn":"Clients","passwordChangeFailureAttempts":"***","passwordChangeFailureLockTime":"***","validation":{"usernameMinLength":"6","usernameMaxLength":"48","passwordMinLength":"***","passwordMaxLength":"***"},"apic":{"baseUri":"https://apiman","gatewayHost":"gw.","catalog":"next","organization":"quat","client":{"clientId":"portal-consumer-toolkit","clientSecret":"***"},"apiPlan":"default-plan","apim":{"baseUri":false,"realm":false,"admin":{"login":false,"password":"***"}}},"ping":{"restApiUri":"https://login./pf-ws/rest","revokeToken":"https://login./as/revoke_token.oauth2","refreshToken":"https://login./as/token.oauth2","exchangeToken":"https://login./as/token.oauth2","createDisabled":false,"credentials":{"oauth":{"login":"devportal","password":"***"},"devPortal":{"login":"dev-portal","password":"***"}}},"mail":{"options":{"host":"smtp.","port":"25"},"contact":{"to":"banking@.com","from":"Devportal@.com","subject":Developer Portal Contact","template":"<p><strong>[[name]]</strong> &lt;[[email]]&gt;</p><p>[[message]]</p>"},"events":{"tracking":"account:registration, account:activation","to":"banking@.com","from":"Devportal-Test@.com","subject":"Developer Portal [[eventName]]","templates":{"default":"<p><strong>[[eventName]]</strong></p><p>[[username]]</p>","account:registration":"<p><strong>[[eventName]]</strong></p><p>New Account registered with [[username]] / [[organization]]</p>","account:activation":"<p><strong>[[eventName]]</strong></p><p>Account registered with [[username]] / [[organization]] has been activated</p>"}}},"recaptcha":{"enabled":false,"verifyUri":"https://www.google.com/recaptcha/api/siteverify","secret":"***"},"statics":"../data/public","PRODUCTION":true},"msg":"server config","v":1}

Oct 20 10:38:26 <hostname> developer-portal-api: {"level":"info","time":1634715506836,"pid":7107,"hostname":"<hostname>","msg":"server started on port 88","v":1}
Oct 20 10:38:27 <hostname> developer-portal-api: {"level":"info","time":1634715507338,"pid":7107,"hostname":"<hostname>","msg":"ldap connecting ldap://p:13","v":1}
Oct 20 10:38:27 <hostname> developer-portal-api: {"level":"info","time":1634715507347,"pid":7107,"hostname":"<hostname>","msg":"ldap client connected","v":1}
Oct 20 10:38:27 <hostname> developer-portal-api: {"level":"info","time":1634715507707,"pid":7107,"hostname":"<hostname>","msg":"ldap credentials bound","v":1}

In the below screenshot, for today, there are only 4 log events which are not in order (above first big log event is not even seeing here)

log line Oct 20 10:38:26 ...... server started on port 8877 is correct on both side but remaining three log lines of Oct 20 10:38:27... are not in sequence as server log.

Thanks,

As I said, Logstash does not guarantee event order, setting pipeline.workers to 1 can help sometimes with ordering, but it is not guaranteed that it will work always.

If even with pipeline.workers set to 1 the events are not ordered, you will need to see if you can use something from the event to make it ordered.

In the case of these last messages you have a field that can be used, the field time with a time in epoch.

You will need to parse this message to extract this field and use the date filter to use this field as the timestamp.

But those messages are completely different from the first one, are you parsing the json part of this message? How many differente message formats do you have in this log?

@leandrojmp, Thanks for replying back.

Can we not used/extract time which is their in the message field (the first thing in message field) because that is exact same message from the server log file and if we can order the message based on this , then ordering will be same as server logs.

The Time field which we are seeing at left side of the above kibana screenshot, is something kibana adds?
There is difference between both these time. Left Time is bit ahead of the time in message field.

There is @timestamp field also under Available fields at left side of above screen shot but i think it same as Time.


about the first big message log,

I don't see any multiline setting for this application log in filebeat.yml (like i have for tomcat logs)

- type: log
      fields_under_root: true
      fields:
         log_type:  portal_app_server
         app_id: node
      paths:
        - /var/log/portal-app/server.log
        - /var/log/portal-app/server-err.log

in logstash.conf, i have below for this log.

filter {
if [log_type] == "portal_app_server" and [app_id] == "node"
  {
    grok { match => { "message" => "%{SYSLOGBASE} %{GREEDYDATA:json_message}"  } } json { source =>  "json_message" }
    mutate {
             replace => {
               "[type]" => "portal_app_server"
             }
           }
  }

The Time column in Discover will use the time field you choose when you created the index pattern, normally it is the @timestamp field when you ingest using logstash.

This @timestamp field is created by logstash when an event enters the pipeline, it will use the current time when the event entered the pipeline.

You can use a date filter in your logstash pipeline to override the @timestamp field with the time from your logs, the config you shared doesn't show any date filter, so your @timestamp field is created by logstash.

On these example messages from your log you have two time informations that you can extract into fields.

Oct 20 10:38:27 <hostname> developer-portal-api: {"level":"info","time":1634715507338,"pid":7107,"hostname":"<hostname>","msg":"ldap connecting ldap://p:13","v":1}
Oct 20 10:38:27 <hostname> developer-portal-api: {"level":"info","time":1634715507347,"pid":7107,"hostname":"<hostname>","msg":"ldap client connected","v":1}
Oct 20 10:38:27 <hostname> developer-portal-api: {"level":"info","time":1634715507707,"pid":7107,"hostname":"<hostname>","msg":"ldap credentials bound","v":1}

The first one is on the beginning of the log: Oct 20 10:38:27

If you want to order your events you cannot use this as it has only seconds precision, all those 3 events happened on different times, but if you use this time as the @timestamp they will appear as happening at the same time and the order is not guaranteed.

The other time information is in the json part of the message:

{"level":"info","time":1634715507707,"pid":7107,"hostname":"<hostname>","msg":"ldap credentials bound","v":1}

The time key, this is a unix epoch time that you can use in the date filter.

If you are parsing the message and you have the field time in your document, then you can use the date filter.

date {
    match => ["time", "UNIX_MS"]
}

This filter will use the value of the field time as the value for @timestamp, maybe this can help you order the events as you have a milisecond precision in that field.

Hi @leandrojmp,

Thanks for reply.

I got your point on using epoch time from logs for ordering as it has milliseconds which might help in ordering instead of other time field which is only having till seconds and there can be many events for a single second and therefore events then might not appear in order.

I do have time field in my documents right? I mean i am referring the same epoch time in the above log lines (e.g "time":1634715507338) we are talking about.

If yes,

I just check whether that time field is in each line of log or not.

[root@ip-~]# cat  /var/log/portal-app/server.log |wc -l
4062
[root@ip- ~]# cat  /var/log/portal-app/server.log |grep '"time":'| wc -l
4062
[root@ip- ~]#

It is there in every log line.

If above understanding is correct, is below config correct in logstash.conf of fliter plugin to include date filter,

filter {
if [log_type] == "portal_app_server" and [app_id] == "node"
  {
    date {
    match => ["time", "UNIX_MS"]
         }
    grok { match => { "message" => "%{SYSLOGBASE} %{GREEDYDATA:json_message}"  } } json { source =>  "json_message" }
    mutate {
             replace => {
               "[type]" => "portal_app_server"
             }
           }
  }

I think i need to add UNIX_MS somewhere to get it indexed further which i am missing above.

Thanks,

@warkolm,

Sorry to add you directly. I think you have the right (have seen in other discussion), can you please change the tag from kibana to logstash.

Your date filter needs to be after the parsing, not before.

You are adding it before you parse your message, the time field will only be available after you parse the message.

Put your date filter after the json filter.

Your conditional should look something like this:

if [log_type] == "portal_app_server" and [app_id] == "node" {
    grok { 
        match => { "message" => "%{SYSLOGBASE} %{GREEDYDATA:json_message}"  } 
    } 
    json { 
        source =>  "json_message" 
    }
    date {
        match => ["time", "UNIX_MS"]
    }
    mutate {
        replace => {
            "[type]" => "portal_app_server"
        }
    }
}

Thank you @leandrojmp

I will give a try and let you know.

Please use the Flag button at the bottom of the topic in future, it reaches all the admins :slight_smile: