Hello
I am modifying the apache access processor to cut the url
There is a lot of useful information on a URI to get the correct SEO data
Imagine the following cases and their associated data
https://www.domain.com/project1/
-> project1 ( directory )
https://www.domain.com/project1/page.html
https://www.domain.com/project1/page ( exemple of url rewriting )
-> project1 ( directory )
-> page or page.html ( filename -> query ressource )
https://www.domain.com/project2/submodule/contact.php
-> project2 ( directory )
-> submodule ( subdirectory )
-> contact.php ( filename )
https://www.domain.com/project3/submodule2/demo.php?id=12&name=toto
-> project3 ( directory )
-> submodule2 ( subdirectory )
-> demo.php ( filename )
-> id=12&name=toto ( variables )
I did some cutting tests and ran into several problems.
Here is an example:
Processor filebeat/module/apache/access/ingest/default.json
{
"split": {
"field": "url.original",
"target_field": "url.split",
"separator": "/",
"ignore_missing": true
}
}
JSON result :
"url": {
"original": "/project3/submodule2/demo.php?id=12&name=toto",
"split": [
"",
"project3",
"submodule2",
"demo.php?id=12&name=toto"
]
},
The first visible problem is that the first empty element is taken into consideration
How to do like in php an "array_filter" with processor ?
{
"foreach" : {
"field" : "url.split",
"processor" : {
"drop" : {
"if" : "field.x == null"
}
}
}
}
Here is the idea that I had to circumvent the problem. How to modify the foreach and the split to analyze that "project3/" is a directory and that "/demo.php" is the resource ?
I would do a split with "?" as separator to differentiate the path of the variables but I do not know how to treat the array
Before working with the elasticsearch processor, I use CSV plugin from logstash
- csv {
- source => "[apache2][access][url]"
- separator => "/"
- target => "[apache2][access][urlparsed2]"
- skip_empty_columns => true
- }
It was not great but it was already ok
I hope I was clear
If we get to something functional I would do the PR on the git
Thank you in advance for your help
William