Filebeat - Apache module - Access - URI detail / SEO

Hello
I am modifying the apache access processor to cut the url
There is a lot of useful information on a URI to get the correct SEO data
Imagine the following cases and their associated data

https://www.domain.com/project1/ 
-> project1 ( directory )

https://www.domain.com/project1/page.html
https://www.domain.com/project1/page ( exemple of url rewriting )
-> project1 ( directory )
-> page or page.html ( filename -> query ressource )

https://www.domain.com/project2/submodule/contact.php
-> project2  ( directory )
-> submodule ( subdirectory ) 
-> contact.php ( filename )

https://www.domain.com/project3/submodule2/demo.php?id=12&name=toto
-> project3  ( directory )
-> submodule2 ( subdirectory ) 
-> demo.php ( filename )
-> id=12&name=toto ( variables )

I did some cutting tests and ran into several problems.
Here is an example:

Processor filebeat/module/apache/access/ingest/default.json

{
 "split": {
 "field": "url.original",
 "target_field": "url.split",
 "separator": "/",
 "ignore_missing": true
 }
}

JSON result :

    "url": {
      "original": "/project3/submodule2/demo.php?id=12&name=toto",
      "split": [
        "",
        "project3",
        "submodule2",
        "demo.php?id=12&name=toto"
      ]
    },

The first visible problem is that the first empty element is taken into consideration
How to do like in php an "array_filter" with processor ?

  {
    "foreach" : {
      "field" : "url.split",
      "processor" : {
        "drop" : {
          "if" : "field.x == null"
        }
      }
    }
  }

Here is the idea that I had to circumvent the problem. How to modify the foreach and the split to analyze that "project3/" is a directory and that "/demo.php" is the resource ?

I would do a split with "?" as separator to differentiate the path of the variables but I do not know how to treat the array

Before working with the elasticsearch processor, I use CSV plugin from logstash

-      csv {
-        source => "[apache2][access][url]"
-          separator => "/"
-          target => "[apache2][access][urlparsed2]"
-          skip_empty_columns => true 
-      }

It was not great but it was already ok

I hope I was clear

If we get to something functional I would do the PR on the git

Thank you in advance for your help
William

Up please :slight_smile:

@jsoriano maybe can you help me ?

Hey @willouuu,

Have you tried with the script processor? You should be able to do any filtering with it :slight_smile:

Thank you !
I had seen "painless" but I did not understand the power and how the exploited
I started writing my parsing but I still do not understand how to add this code in the pipeline. Can I put it other than in the source field? it's complicated to maintain like this, especially with the use of functions.

{
  "processors": [
    {
      "script": {
        "lang": "painless",
        "source": "xxxxxxxxxxxxxx"
      }
    },

I put my code if it can help as a base ( it's not finished)

// def url =  doc['url'].value;
def url = "/project3/submodule2/demo.php?id=12&name=toto#titi";

// Function extract last
def getExtensionFromFilename(filename) {   
 def returned_value = ""   
 m = (filename =~ /(\.[^\.]*)$/)   
 if (m.size()>0) returned_value = ((m[0][0].size()>0) ? m[0][0].substring(1).trim().toLowerCase() : "");   
 return returned_value 
}

// All type items
String[] path = url.split(/\//);
String[] query = url.split(/\?/); 
String[] fragment = url.split(/\#/); 
String[] filetype = url.split(/\./); 

// Explode PATH - limit to 4 items ( prevent many error )
List pathExplode = Arrays.asList(path); 
String file = pathExplode.last()
String p1 = pathExplode[1]; 
String p2 = pathExplode[2];
String p3 = pathExplode[3];
String p4 = pathExplode[5];


// Explode FILE
String[] ftBundle = file.split(/\?/); // split after "?"
List ftExplode = Arrays.asList(ftBundle); 
String ftsplit = ftExplode[0]; 
String ftt = getExtensionFromFilename(ftsplit);  // get php?id=12&name=toto#titi

// Explode QUERY
List queryExplode = Arrays.asList(query); 
String qr = queryExplode[1]; 

// Explode FRAGMENT
List fragmentExplode = Arrays.asList(fragment); 
String frag = fragmentExplode[1]; 



// Display all items
return "Path \n - "+p1+"\n - "+p2+"\n - "+p3+"\n - "+p4+"\nQuery \n - "+qr+"\nFragment \n - "+frag+"\nFilename \n - "+ftsplit+"\nFileType \n - "+ftt;

Result :
Path
- project3
- submodule2
- demo.php?id=12&name=toto#titi
- null
Query
- id=12&name=toto#titi
Fragment
- titi
Filename
- demo.php
FileType
- php

Sorry to disturb you @jsoriano
how can I integrate this code into a pipeline please ?

Thanks :pray:

@willouuu indeed including scripts in JSON pipelines can be tedious, this is why we added support to use YAML files for this . You can find some examples in the panw module.

1 Like

Thanks you so much Jaime !
I'm watching this tomorrow