Filebeat to Logstash : How to keep lines order


(xavier) #1

Hello,

I use a filebeat instance to read some log files and send them to a logstash instance on another server to store them as a file output.

On one of the logs, line order is not critical, so I haven't encountered this issue before, but on the new log type I send, the line order is very important and the timestamp permits an accuracy to the second only (and many events occurs in one second).

Here is an example of the initial log read by filebeat :

[root@opaxvpol1 log]# cat log_prd_2018_10_30.log | grep "QyjYW/COKwE"
opaxvgw4,       QyjYW/COKwE            20181030 104539 102 NET   I CONN_IND  (477937) incoming connection indication   [src_add="xxx/63227",dest_add="xxx/6321"]
opaxvgw4,       QyjYW/COKwE            20181030 104539 106 SECS  I SES_INIT  Server(1843946) Net profile GW_SFTP selected [src_add="xxx/63227"]  [dest_add="xxx/6321"]
opaxvgw4,       QyjYW/COKwE            20181030 104539 111 SECS  I SES_INIT  Server(1843946) SSH profile GATEWAY_SSH_SRV selected
opaxvgw4,       QyjYW/COKwE            20181030 104539 103 NET   I CONN_RESP (477937) incoming connection response     [resp_add=""]
opaxvgw4,       QyjYW/COKwE            20181030 104539 114 SECS  I SIGN_OK   Server(1843946) DSS signature achieved with private key GATEWAY_PRIV
opaxvgw4,       QyjYW/COKwE            20181030 104539 003 PCNX  I CONN      (27009770) SFTP Connection Request Received
opaxvgw4,       QyjYW/COKwE            20181030 104539 004 PCNX  I CONN      (27009770) calling addr="xxx/63227", called_addr="xxx/6321"
opaxvgw4,       QyjYW/COKwE            20181030 104539 004 PCNX  I CONN      (27009770) login="MLC01", pwd=""
opaxvgw4,       QyjYW/COKwE            20181030 104539 001 PCNX  I SEL       (27009770) CGate xxx selected
opaxvgw4,       QyjYW/COKwE            20181030 104539 005 PCNX  I CONN      (27009770) template_site = "TSFTP" [GSFTP]
opaxvgw4,       QyjYW/COKwE            20181030 104539 005 PCNX  I CONN      (27009770) root_directory = "/MLC01" [MLC01], home_directory = "/MLC01" [MLC01]
opaxvgw4,       QyjYW/COKwE            20181030 104539 005 PCNX  I CONN      (27009770) route_local_agent = [], route_remote_agent = []
opaxvgw4,       QyjYW/COKwE            20181030 104539 005 PCNX  I CONN      (27009770) route_originator_ident = [], route_destination_ident = []
opaxvgw4,       QyjYW/COKwE            20181030 104539 118 SECS  I SIGN_OK   Server(1843946) DSS signature verification achieved with public key CLT_MLC01
opaxvgw4,       QyjYW/COKwE            20181030 104539 102 SECS  S SES_SUC   Server(1843946) Session established for user MLC01, key exchange algo: dh-group1, public key algo: ssh-dss, cipher algo: aes128-cbc, mac algo: md5, no compression
opaxvgw4,       QyjYW/COKwE            20181030 104539 067 NET   I SSHINFO   (477937) SSH incoming connection from Client SSH-VERSION-STRING:SSH-2.0-JSCH-0.1.54 Local Server SSH-VERSION-STRING:SSH-2.0-XFB.Gateway Unix
opaxvgw4,       QyjYW/COKwE            20181030 104539 019 SFTP  I XFERSND2  GREENTRF(175857) [0] begin sending from , LIST:
opaxvgw4,       QyjYW/COKwE            20181030 104539 023 SFTP  I XENDSND2  GREENTRF(175857) [0] end sending from , LIST:
opaxvgw4,       QyjYW/COKwE            20181030 104539 106 NET   I DISC_IND  (477937) disconnection indication         [reason="Success (0x0)"] [origin="0"]
opaxvgw4,       QyjYW/COKwE            20181030 104539 104 SECS  I SES_END   Server(1843946) Session ended for user MLC01

And here is the file written on the logstash side :

[root@opsgyst1 ~]# cat /opt/application/splunk/logs_data/axv/log_prd_2018_10_30.log | grep "QyjYW/COKwE"
opaxvgw4,       QyjYW/COKwE            20181030 104539 106 SECS  I SES_INIT  Server(1843946) Net profile GW_SFTP selected [src_add="xxx/63227"]  [dest_add="xxx/6321"]
opaxvgw4,       QyjYW/COKwE            20181030 104539 001 PCNX  I SEL       (27009770) CGate xxx selected
opaxvgw4,       QyjYW/COKwE            20181030 104539 005 PCNX  I CONN      (27009770) route_originator_ident = [], route_destination_ident = []
opaxvgw4,       QyjYW/COKwE            20181030 104539 019 SFTP  I XFERSND2  GREENTRF(175857) [0] begin sending from , LIST:
opaxvgw4,       QyjYW/COKwE            20181030 104539 103 NET   I CONN_RESP (477937) incoming connection response     [resp_add=""]
opaxvgw4,       QyjYW/COKwE            20181030 104539 005 PCNX  I CONN      (27009770) root_directory = "/MLC01" [MLC01], home_directory = "/MLC01" [MLC01]
opaxvgw4,       QyjYW/COKwE            20181030 104539 102 SECS  S SES_SUC   Server(1843946) Session established for user MLC01, key exchange algo: dh-group1, public key algo: ssh-dss, cipher algo: aes128-cbc, mac algo: md5, no compression
opaxvgw4,       QyjYW/COKwE            20181030 104539 106 NET   I DISC_IND  (477937) disconnection indication         [reason="Success (0x0)"] [origin="0"]
opaxvgw4,       QyjYW/COKwE            20181030 104539 111 SECS  I SES_INIT  Server(1843946) SSH profile GATEWAY_SSH_SRV selected
opaxvgw4,       QyjYW/COKwE            20181030 104539 003 PCNX  I CONN      (27009770) SFTP Connection Request Received
opaxvgw4,       QyjYW/COKwE            20181030 104539 004 PCNX  I CONN      (27009770) calling addr="xxx/63227", called_addr="xxx/6321"
opaxvgw4,       QyjYW/COKwE            20181030 104539 005 PCNX  I CONN      (27009770) template_site = "TSFTP" [GSFTP]
opaxvgw4,       QyjYW/COKwE            20181030 104539 118 SECS  I SIGN_OK   Server(1843946) DSS signature verification achieved with public key CLT_MLC01
opaxvgw4,       QyjYW/COKwE            20181030 104539 023 SFTP  I XENDSND2  GREENTRF(175857) [0] end sending from , LIST:
opaxvgw4,       QyjYW/COKwE            20181030 104539 102 NET   I CONN_IND  (477937) incoming connection indication   [src_add="10.228.175.75/63227",dest_add="10.117.40.6/6321"]
opaxvgw4,       QyjYW/COKwE            20181030 104539 114 SECS  I SIGN_OK   Server(1843946) DSS signature achieved with private key GATEWAY_PRIV
opaxvgw4,       QyjYW/COKwE            20181030 104539 004 PCNX  I CONN      (27009770) login="MLC01", pwd=""
opaxvgw4,       QyjYW/COKwE            20181030 104539 005 PCNX  I CONN      (27009770) route_local_agent = [], route_remote_agent = []
opaxvgw4,       QyjYW/COKwE            20181030 104539 067 NET   I SSHINFO   (477937) SSH incoming connection from Client SSH-VERSION-STRING:SSH-2.0-JSCH-0.1.54 Local Server SSH-VERSION-STRING:SSH-2.0-XFB.Gateway Unix
opaxvgw4,       QyjYW/COKwE            20181030 104539 104 SECS  I SES_END   Server(1843946) Session ended for user MLC01

Is there an option on either side (logsatsh or filebeat) to ensure that lines ordering is kept ?

Thank you


(Jaime Soriano) #2

Hi @zebu14,

Could you share filebeat and logstash configurations?


(xavier) #3

Sure,

My configuration is almost the default one.

Logstash:

  • logstash.yml ==> default

pipeline.workers: 2
pipeline.batch.size: 125
pipeline.batch.delay: 50
...
queue.type: memory
queue.page_capacity: 64mb
queue.max_bytes: 1024mb

Pipelines.yml ==> default

conf.d> pipeline_axv.conf :

The # character at the beginning of a line indicates a comment. Use
comments to describe your configuration.
input {

    beats {
    port => "8754"
    }

}
The filter part of this file is commented out to indicate that it is
optional.
filter {

}
output {

    if [logtype] == "axv_log_prd" {
                                    file {
                                    path => "/opt/application/splunk/logs_data/axv/log_prd_%{+yyyy_MM_dd}.log"
                                    codec => line { format => "%{message}"}
                                    }
    }
    else if [logtype] == "axv_statlog_prd" {

                                    file {
                                    path => "/opt/application/splunk/stats_data/axv/stat_prd_%{+yyyy_MM_dd}.log"
                                    codec => line { format => "%{message}"}
                                    }
    }
    else if [logtype] == "axv_tobeg_prd" {
                                    file {
                                    path => "/opt/application/splunk/tobeg_data/axv/to_beg_prd_%{+yyyy_MM_dd}.log"
                                    codec => line { format => "%{message}"}
                                    }
    }

}


(xavier) #4

And for Filebeat.yml :

> 
> #=========================== Filebeat prospectors =============================
> 
> filebeat.prospectors:
> 
> # Each - is a prospector. Most options can be set at the prospector level, so
> # you can use different prospectors for various configurations.
> # Below are the prospector specific configurations.
> 
> 
> - type: log
>   enabled: true
>   paths:
>     - /opt/application/logs/axway/stat/stat_prd_*.log
>   fields:
>    logtype: axv_statlog_prd
>   fields_under_root: true
> 
> - type: log
>   enabled: true
>   paths:
>     - /opt/application/logs/axway/stat/to_beg_prd_*.log
>   fields:
>    logtype: axv_tobeg_prd
>   fields_under_root: true
> 
> - type: log
>   enabled: true
>   paths:
>     - /opt/application/logs/axway/log/log_prd_*.log
>   fields:
>    logtype: axv_log_prd
>   fields_under_root: true
> 
> 
> 
>   ### Multiline options
> 
>   # Mutiline can be used for log messages spanning multiple lines. This is common
>   # for Java Stack Traces or C-Line Continuation
> 
>   # The regexp Pattern that has to be matched. The example pattern matches all lines starting with [
>   #multiline.pattern: ^\[
> 
>   # Defines if the pattern set under pattern should be negated or not. Default is false.
>   #multiline.negate: false
> 
>   # Match can be set to "after" or "before". It is used to define if lines should be append to a pattern
>   # that was (not) matched before or after or as long as a pattern is not matched based on negate.
>   # Note: After is the equivalent to previous and before is the equivalent to to next in Logstash
>   #multiline.match: after
> 
> 
> #============================= Filebeat modules ===============================
> 
> filebeat.config.modules:
>   # Glob pattern for configuration loading
>   path: ${path.config}/modules.d/*.yml
> 
>   # Set to true to enable config reloading
>   reload.enabled: false
> 
>   # Period on which files under path should be checked for changes
>   #reload.period: 10s
> 
> #==================== Elasticsearch template setting ==========================
> 
> setup.template.settings:
>   index.number_of_shards: 3
>   #index.codec: best_compression
>   #_source.enabled: false
> 
> #================================ General =====================================
> 
> # The name of the shipper that publishes the network data. It can be used to group
> # all the transactions sent by a single shipper in the web interface.
> #name:
> 
> # The tags of the shipper are included in their own field with each
> # transaction published.
> #tags: ["service-X", "web-tier"]
> 
> # Optional fields that you can specify to add additional information to the
> # output.
> #fields:
> #  env: staging
> 
> 
> 
> #================================ Outputs =====================================
> 
> # Configure what output to use when sending the data collected by the beat.
> 
> 
> #----------------------------- Logstash output --------------------------------
> output.logstash:
>   # The Logstash hosts
>   hosts: ["xx.xx.xx.xx:8754"]
> 
>   # Optional SSL. By default is off.
>   # List of root certificates for HTTPS server verifications
>   #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]
> 
>   # Certificate for SSL client authentication
>   #ssl.certificate: "/etc/pki/client/cert.pem"
> 
>   # Client Certificate Key
>   #ssl.key: "/etc/pki/client/cert.key"

(Jaime Soriano) #5

@zebu14 can you try to reduce the number of logstash pipeline.workers to 1? Having two workers processing and writing events in parallel could lead to different ordering.


(xavier) #6

Hello,

Your solution seems to do the job.
How can I ensure that an only worker is sufficient according to the load becoming more important with time ?


(Jaime Soriano) #7

Yes, as you say, keeping the number of pipeline.workers to one can be a bottleneck depending on the load, but in your scenario I think this is the only way to guarantee the order.

Have you considered centralizing the logs in elasticsearch? This would be a more scalable solution, and it offers many more features than plain files :slightly_smiling_face:


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.