Extract field from url?

I want to analyze AWS ELB logs, and I need to grep "instance" word from URL into a separate field, could you please provide an example, how to do it in filter section in logstash. Thanks

logs:
"GET https://cloude.company.com:443/instance HTTP/2.0"
"GET https://cloude.company.com:443/instance/Web/Services/Service.html HTTP/2.0"

Try the below once

filter{
	grok{
		match => { "message" => "%{WORD:verb} %{NOTSPACE:request} HTTP/%{NUMBER:httpversion}"}
	}
	mutate {
     split => { "request" => "/" }
     add_field => { "instance_name" => "%{request[3]}" }
  }
}

In case of match => [ " message" "%{WORD:verb} %{NOTSPACE:request} HTTP/%{NUMBER:httpversion}" ]

request = https://cloude.company.com:443/instance/App_Themes/Default/PageSpecific/images/warning.png

split => { "request" => "/" }

https:, , cloude.company.com:443, instance, App_Themes, Default, PageSpecific, images, warning.png

when i tryed:
add_field => { "path_shortx" => "%{request[4]}" }

[2019-05-20T12:18:40,455][FATAL][logstash.runner] An unexpected error occurred! {:error=>java.lang.IllegalStateException: org.logstash.FieldReference$IllegalSyntaxException: Invalid FieldReference: `request[4]`, :backtrace=>["org.logstash.execution.WorkerLoop.run(org/logstash/execution/WorkerLoop.java:85)", "java.lang.reflect.Method.invoke(java/lang/reflect/Method.java:498)", "org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(org/jruby/javasupport/JavaMethod.java:440)", "org.jruby.javasupport.JavaMethod.invokeDirect(org/jruby/javasupport/JavaMethod.java:304)", "usr.share.logstash.logstash_minus_core.lib.logstash.java_pipeline.start_workers(/usr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:235)", "org.jruby.RubyProc.call(org/jruby/RubyProc.java:295)", "org.jruby.RubyProc.call(org/jruby/RubyProc.java:274)", "org.jruby.RubyProc.call(org/jruby/RubyProc.java:270)", "java.lang.Thread.run(java/lang/Thread.java:748)"]}
[2019-05-20T12:18:40,471][ERROR][org.logstash.execution.WorkerLoop] Exception in pipelineworker, the pipeline stopped processing new events, please check your filter configuration and restart Logstash.org.logstash.FieldReference$IllegalSyntaxException: Invalid FieldReference: request[4]

Make that %{[request][4]} (although I think you want [3], not [4]).

It separate like this:
"request" => [
[0] "https:",
[1] "",
[2] "cloude-company.com:443",
[3] "Instance",
[4] "Web",
[5] "HomePage",
[6] "Widget",
[7] "Data"
]

Notice, that if I copied one field to another, and than split and add new field - it it does't work:
mutate {
copy => {"request" => "request_split"}
split => {"request_split" => "/"}
add_field => { "instance" => "%{request_split[3]}" }
}

but if I split and add new field without copiying, it works!
mutate {
split => {"request" => "/"}
add_field => { "instance" => "%{request[3]}" }
}

And also works:
mutate {
copy => {"request" => "request_split"}
}
mutate {
split => {"request_split" => "/"}
add_field => { "instance" => "%{request_split[3]}" }
}

A mutate filter performs mutations in a fixed order, which is unlikely to be the order you want.

I thought you want to take any value that's available in the place of instance.

What do you exactly want to achieve and place all the scenarios of your log for more clarity

The idea is to take value from URL https://cloude.company.com:443/**instance**/Web/Services/Service.html
Filter that value, and add it into a field, and build all visualization based on that field. Will named that field as instance as an example.

This is my ELB LOG:
h2 2019-05-09T23:55:51.100101Z elb_name 100.100.100.10:59907 192.168.1.10:443 0.007 0.088 0.000 200 200 1474 36394 "POST https://cloude-company.com:443/instance/Web/HomePage/Widget/Data HTTP/2.0" "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:eu-west-1:100101001010:targetgroup/cloude/aa1001010010100a "Root=1-5cd4be07-1001010010100101001010010" "cloude.company.com" "session-reused" 8 2019-05-09T23:55:51.160000Z "forward" "-" "-"

Create a filter:

filter {
if [type] == "elb" {
grok {
match => [ "message", "%{WORD:connection} %{TIMESTAMP_ISO8601:timestamp} %{NOTSPACE:elb} %{IP:clientip}:%{INT:clientport:int} (?:(%{IP:backendip}:?:%{INT:backendport:int})|-) %{NUMBER:request_processing_time:float} %{NUMBER:backend_processing_time:float} %{NUMBER:response_processing_time:float} (?:-|%{INT:elb_status_code:int}) (?:-|%{INT:backend_status_code:int}) %{INT:received_bytes:int} %{INT:sent_bytes:int} \"%{WORD:verb} %{NOTSPACE:request} HTTP/%{NUMBER:httpversion}\" \"(?:-|%{DATA:user_agent})\" (?:-|%{NOTSPACE:ssl_cipher}) (?:-|%{NOTSPACE:ssl_protocol})" ]
}
if [path] and !([path] == "/") { 
mutate {
copy => {"path" => "path_split"}
}
mutate {
split => {"path_split" => "/"}
add_field => { "url" => "%{path_split[1]}" }
}
}
if [url] in [ "instance1", "instance2", "instance3", "instance4", "instance5" ] {
mutate {      
add_field => { "instance" => "%{url}" }
}
}
mutate {
remove_field => [ "path_split" ]
}
date {
match => [ "timestamp", "ISO8601" ]
}
geoip {
source => "clientip"
}
}
}

And it works for my needs. But then I discovered the simplest way, for the same log, I used:

grok {
match => ["message", "%{ELB_ACCESS_LOG} \"%{DATA:userAgent}\"( %{NOTSPACE:ssl_cipher} %{NOTSPACE:ssl_protocol})?"]
}
grok {
match => ["path", "/%{DATA:url}/%{GREEDYDATA:action_path}"]
}
if [url] in ["instance1", "instance2", "instance3", "instance4", "instance5" ] {
mutate {      
add_field => { "instance" => "%{url}" }
}
}

but it has one cons, in case when URL comes ending by instance
https://cloude.company.com:443/instance
it does't write a value into field url.

You could use something like

grok { match => ["path", "/(?<url>[^/]+)($|/%{GREEDYDATA:action_path})"] }
1 Like

It works! thanks a lot.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.