Pipelines stops grok filter in Logstash


(Shine) #1

PROBLEM:
The grok filter is skipped when I run two .conf files using pipelines in Logstash. All the data respectively goes into each index but it skips the grok filter. The grok filter works once for the 1st record Logstash receives but then skip the grok filter for new log entries and fails for all log entries in the 2nd index created.

RESULT:
If I run each xxx.conf one at a time, then each index (site_1_truck, site_1_car) is created and data parsed correctly through grok and saved into ElasticSearch.

GOAL:
We have 12 server(Ubuntu 14.04) with 8 customer websites hosted on each one.
We need to track 5 custom logs for each customer's website.
12 servers
-- 8 customers/server
--- 5 logs_files/customer

WHAT I RULED-OUT:
Server_A can connect to our Logstash Server 100.2.3.4
telnet 100.2.3.4 5044
Logstash Server is listening on 5044 & 5045
netstat -nat

MY TEST
filebeat 6.2.3
logstash 6.2.4
elasticSearch 6.2.4

I installed one instance of Filebeats on a test server.

------- filebeat.yml on Server_A -------

- type: log
  enabled: true
  paths:
    - /usr/local/glassfish/domains/customer_site_1/log/car.log
  fields:
    customer_name: customer_site_1_car
    file_type: car_log
	
- type: log
  enabled: true
  paths:
    - /usr/local/glassfish/domains/customer_site_1/log/truck.log
  fields:
    customer_name: customer_site_1_truck
    file_type: truck_log

output.logstash:
  hosts: ["100.2.3.4:5044"]

------- Logstash [100.2.3.4] on Ubuntu 16.04 -------
----------------------- Pipelines (/etc/logstash/pipelines.yml) -----------------------

- pipeline.id: process_logs_id
  path.config: "/etc/logstash/conf.d/send_logs.conf"

NOTE: how I tested config files

  /usr/share/logstash/bin#./logstash --config.test_and_exit -f /etc/logstash/conf.d/send_logs.conf 
	Config Validation Result: OK.

------- Conf Files (/etc/logstash/conf.d/) -------
send_logs.conf

input {
  beats {
   port => 5044
   type => "log"
  }
}
filter
{
			if [fields][file_type] == 'car_log'
		{
			grok {
					match => {"message" => "\[%{TIMESTAMP_ISO8601:timestamp_utc_jvm}\] \[(?:%{DATA:app_server}|)\] \[%{WORD:severity}\] \[(?:%{DATA:glassfish_code}|)\] \[(?:%{JAVACLASS:java_pkg}|)\] \[(?<Thread_Name>[^\]]+)\] \[(?<timeMillis>[^\]]+)\] \[(?<levelValue>[^\]]+)\] (?<log_msg>(?m:.*))"}
				 }
		}
		if [fields][file_type] in ['truck_log', 'next_log_type']
		{
			grok {
					match => {"message" => "\[%{TIMESTAMP_ISO8601:timestamp_utc_jvm}\] %{WORD:severity} %{JAVACLASS:java_pkg} \[(?<Thread_Name>[^\]]+)\] (?<log_msg>(?m:.*))"}
				 }
		}
}
output {
  elasticsearch {
    hosts => "your_elastic_search_server.com:9200"
    manage_template => false
    index => "%{[fields][customer_name]}_%{[fields][file_type]}"
  }
  stdout {}
}

#2

Can you give us examples of documents where the filter was not applied?


(Shine) #3

car.log
[2018-05-04T00:00:00,008] INFO c.t.u.s.l.LogReclaim [__Preformatted textejb-thread-pool11] LogReclaim: remove 21 files

truck.log
[2018-02-23T14:46:15.790-0500] [Payara 4.1] [INFO] [NCLS-CORE-00087] [javax.enterprise.system.core] [tid: _ThreadID=39 _ThreadName=RunLevelControllerThread-1519415173703] [timeMillis: 1519415175790] [levelValue: 800] [[
Grizzly Framework 2.3.31 started in: 96ms - bound to [/0.0.0.0:3700]]]


(Shine) #4

If I run the site_1_car_log.conf or site_1_truck_log.conf individually they get parsed correctly.

/usr/share/logstash/bin#./logstash -f /etc/logstash/conf.d/site_1_car_log.conf
or
/usr/share/logstash/bin#./logstash -f /etc/logstash/conf.d/site_1_truck_log.conf

So I think the grok is correct. But when both are running at the same time using pipelins.yml, it causes a problem.


#5

For the truck log, the line you showed is not even close to matching the grok pattern. The timestamp matches, then your log has [Payara 4.1] but your pattern has %{WORD:severity}. A WORD is a sequence of A through Z, or a through z, or 0 through 9.

The car log should be matching. Does the document in elasticsearch have a _grokparsefailure tag?


(Shine) #6

I mistakenly put the wrong grok for 'site_1_truck_log.conf' which I updated now. (It still has a problem)

You helped me realize my grok is the problem. There are log entries that don't fit 'site_1_truck_log.conf' pattern. I'm having a tough time capturing the stack trace when there are special characters () [] other than the beginning and ending [[ ]]

How would you write a grok to capture this?

[...info here...] [[
  WELD-000411: Observer method [BackedAnnotatedMethod] processAnnotatedType(@Observes ProcessAnnotatedType) some text here.]]

[...info here...] [[  
[[
  Loading application [sample] at [/sample202]]]

GOAL:
log_message = WELD-000411: Observer method [BackedAnnotatedMethod] processAnnotatedType(@Observes ProcessAnnotatedType) some text here

log_message = Loading application [sample] at [/sample202]


(Shine) #7

I figured out the problems. (I updated my code above to reflect changes that work.)

  1. My grok filter was not parsing out blank values correctly

Incorrect
\[%{TIMESTAMP_ISO8601:timestamp_utc_jvm}\] (\[(%{WORD:server_name} %{NUMBER:server_ver})\]) \[%{WORD:severity}\] (\[%{WORD:glassfish_code_1}-(%{WORD:glassfish_code_2})-(?<glassfish_code_3>[0-9]{1,6})\]|\[%{GREEDYDATA}\]) (\[%{JAVACLASS:java_package}\]|\[%{GREEDYDATA}\]) \[(?<Thread_Name>[^\]]+)\] \[(?<timeMillis>[^\]]+)\] \[(?<levelValue>[^\]]+)\] \[(?<log_message>[^\]]+)\]

Updated
\[%{TIMESTAMP_ISO8601:timestamp_utc_jvm}\] (\[(%{WORD:server_name} %{NUMBER:server_ver})\]) \[%{WORD:severity}\] \[(?:%{DATA:glassfish_code}|)\] \[(?:%{JAVACLASS:java_pkg}|)\] \[(?<Thread_Name>[^\]]+)\] \[(?<timeMillis>[^\]]+)\] \[(?<levelValue>[^\]]+)\] (?<log_msg>(?m:.*))

  1. My LogStash Config was incorrect.
    I ended up combining many pipelines into one (send_logs.conf) and had send_logs.conf decide which grok filter to use.

  2. Added a "file_type" field in filebeats.yml to notify LogStash which filter to use. (Thanks Badger for the solution!)


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.