Trouble parsing apache log using custom grok pattern


(Hans) #1

Hai, I'm new to Elastic Stack and I'm an intern to a company. One of the managers give me this project and I need you guys help. The use-case here is they want to categorize and analyse old logs from a variety of devices, OS and firewalls like apache, windows event logs, iptables and so on. So, I want to parse apache logs as they are one of the easiest to get by. I experiment a lot but with little progress (yes, I do experiment with filebeat modules but for me, it doesn't quite fit with the use-case given). I'm not that experience with all of this and totally out of my job scope. I'm almost out of my wits here and these guys never touch these so I relying 100% on the Internet to power through these problem. I didn't see any problem with my grok pattern and i tested it out with grok debugger.

p/s: Due to my frustration and rage, I delete all the index and out inside Kibana so i don't have the output but essentially any of the pattern doesn't show and the message didn't get catergorize :frowning:

Filebeat.yml

filebeat.inputs:
- type: log
  enabled: false 
  paths:
    - "/home/elk-kpmg/Desktop/logs/apache2/access/*"
  tags: ["apache","apache_access"]
  
- type: log
  enabled: false 
  paths:
    - "/home/elk-kpmg/Desktop/logs/apache2/error/*"
  tags: ["apache","apache_error"]

#-------------------------- Elasticsearch output ------------------------------
#output.elasticsearch:
  # Array of hosts to connect to.
  #hosts: ["localhost:9200"]

  # Optional protocol and basic auth credentials.
  #protocol: "https"
  #username: "elastic"
  #password: "changeme"

#----------------------------- Logstash output --------------------------------
output.logstash:
  # The Logstash hosts
  hosts: ["localhost:5044"]

  # Optional SSL. By default is off.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

Logstash.yml

filter
{
    if [tags] == "apache"
	{
		if [tags] == "apache_access"
			{
				grok
				{
					match => {"message" => ["%{IPORHOST:Client_IP}%{APACHEINFO}\[% {APACHEDATE:Timestamp}\]\s*%{APACHEDATA}","%{IPORHOST:Client_IP}%{APACHEINFO}\[%{APACHEDATE:Timestamp}\]\s*%{APACHEDATA}\s*%{APACHERADATA}"]}
					patterns_dir => ["/etc/logstash/patterns"]
					add_field => [ "received_at", "%{@timestamp}" ]
  					add_field => [ "received_from", "%{host}" ]
  					tag_on_failure => ["_grokparsefailure"]
				}
				if ["_grokparsefailure"] in [tags]
					{
						drop{}
					}

				date
				{
					match => [ "Timestamp", "dd/MMM/YYYY:H:m:s Z" ]
					target => "Timestamp"
				}

				useragent
				{
					source => "Agent"
    				target => "User_Agent"
    				remove_field => "Agent"
				}

				geoip
				{
					source => "Client_IP"
    				target => "IP_Address"
				}

			}
	}
}

the custom patterns are these;

APACHEINFO (\s*(?:%{USER:Identification}|-)\s*(?:%{USER:Username}|-)\s*)

APACHEDATE %{MONTHDAY}/%{MONTH}/%{YEAR}:%{TIME} %{DATA:Timezone}

APACHEDATA "(?:%{WORD:Method} %{NOTSPACE:Request}(?: HTTP/%{NUMBER:Http_Version})?|%{DATA:Raw_Request})" %{NUMBER:Response_Code} (?:%{NUMBER:Bytes}|-)

APACHERADATA (?:"%{DATA:Referrer}" "%{DATA:Agent}")

(Faulander) #2

if this is your actual configuration, the solution is simple. Read your filebeat.yml file once again and check this line: enabled: false

What do you think does it mean? :slight_smile:

Additional Information:
add_field => [ "received_at", "%{@timestamp}" ] --> You save the data in timestamp, not @timestamp.
tag_on_failure => ["_grokparsefailure"] --> is done automatically
add_field => [ "received_from", "%{host}" ] is done automatically

and then this:
if ["_grokparsefailure"] in [tags]
{
drop{}
}

So if you have ANY problems in your grok, NO message will ever show up in Elastic. I would delete that part until your logstash config works perfectly - and then you don't need it anymore anyway :wink:


(Hans) #3

Hai faulander,

sorry for the late reply. In the filebeat.yml, the enabled:false is actually true during the test. This one is from after I disabled it and was experimenting other config (filebeat module). So, it's a typo on my part during writing this post.

For the additional info, now I get it, I got that config from an online post and I just modified the grok to suit my needs. The grokparsefailure one also from a forum but when I was testing that, the only logs that kibana got is data with grokparsefailure tag but other log without the tag is somewhat drop :confused:

Owh, Thank you for replying to my post and helping me.