Want a Logstash Filter to Parse the Agent Field in Apache Access Log Files


(Brook Hutchinson) #1

Windows 7
Elasticsearch v1.7.0
Logstash v1.5.4
Kibana v4.1.1

I need help defining a filter that I can use to parse the "Agent" Field in an Apache Access Log.

My Apache Access Logs are using the Combined Apache Log Format. I understand that there is an "Agent" Field that is populated in the log file. I want to have Logstash parse the Apache Access Log File (Agent Field) and add new fields in elasticsearch for browser name, operating system, etc. Does anyone have a filter written that I can use to parse the "Agent" Field and create new meaningful field names and field values?

I want elasticsearch to have meaningful browser names like
Google Chrome vXX.X", "Mozilla Firefox vXX.X", "IE v9.0", "IE v10.0"

I want elasticsearch to have meaningful operating system names like
Windows 7, Windows 8, MAC OS vXX.X

Once I have meaningful names in Elasticsearch I want to be able to create Visualizations using Kibana with simple pie charts showing the total users using a specific browser version, a simple pie chart showing the total users using different operating systems.

Here is the filter that I am using so far. This gives me an unparsed "agent" field value.

filter {
grok {
match => [ "message", "%{COMBINEDAPACHELOG}" ]
}

date {
locale => "en"
match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
}
}


(Magnus Bäck) #2

Have you looked at the useragent filter?


(Brook Hutchinson) #3

Thank you again for your suggestion of using the useragent filter. Your responses are very much appreciated.

I looked at the article that you shared on the useragent filter. I guess I do not understand what the name values I can use for the %{reserved words} that I can parse out of the agent field. I believe that the example in your article you provided uses %{host}. What other values (other than host) can I use to populate new fields? Where can I get a complete list of all the values that I can parse out of the agent field? (Example browser name, operating system).

Do you have a simple example of how I would use the useragent filter. Example say I want to parse and capture the browser name in a new field named "browser-name"?


(Magnus Bäck) #4

I believe that the example in your article you provided uses %{host}

Are you talking about the add_field parameter? That's a generic parameter that's available for all filters. Ignore it.

Do you have a simple example of how I would use the useragent filter. Example say I want to parse and capture the browser name in a new field named "browser-name"?

The key parameter for this filter is the mandatory source parameter which contains the name of the field containing the useragent string to parse. So, assuming you have the useragent string in the browser-name field you can use it like this:

filter {
  useragent {
    source => "browser-name"
  }
}

(Brook Hutchinson) #5

Thank you again for your response. I guess I am not understanding how to use the useragent filter. Sorry to ask again. I really want to understand how this filter works.

My Apache Access Logs are using the CombinedApacheFileFormat. One of the fields that is captured is named "agent". How do I use the useragent filter to parse out the browser name from the agent field and put the browser name value inside a new field named "browser-name"?


(Magnus Bäck) #6

The filter will create a number of different fields containing various pieces of information. The documentation of what fields are created is bad (improvement work tracked in issue #10) so I suggest you try it out and see which field matches your notion of browser name. Then use the mutate filter to rename and/or delete fields to your liking.

You may want to set the useragent filter's target parameter to have it store the parsed values in subfields instead of in the root of the event (possibly overwriting other fields with the same name).


(system) #7