Filebeat - Apache module - Access - URI detail / SEO

willouuu · July 9, 2019, 9:01am

Hello
I am modifying the apache access processor to cut the url
There is a lot of useful information on a URI to get the correct SEO data
Imagine the following cases and their associated data

https://www.domain.com/project1/ 
-> project1 ( directory )

https://www.domain.com/project1/page.html
https://www.domain.com/project1/page ( exemple of url rewriting )
-> project1 ( directory )
-> page or page.html ( filename -> query ressource )

https://www.domain.com/project2/submodule/contact.php
-> project2  ( directory )
-> submodule ( subdirectory ) 
-> contact.php ( filename )

https://www.domain.com/project3/submodule2/demo.php?id=12&name=toto
-> project3  ( directory )
-> submodule2 ( subdirectory ) 
-> demo.php ( filename )
-> id=12&name=toto ( variables )

I did some cutting tests and ran into several problems.
Here is an example:

Processor filebeat/module/apache/access/ingest/default.json

{
 "split": {
 "field": "url.original",
 "target_field": "url.split",
 "separator": "/",
 "ignore_missing": true
 }
}

JSON result :

    "url": {
      "original": "/project3/submodule2/demo.php?id=12&name=toto",
      "split": [
        "",
        "project3",
        "submodule2",
        "demo.php?id=12&name=toto"
      ]
    },

The first visible problem is that the first empty element is taken into consideration
How to do like in php an "array_filter" with processor ?

  {
    "foreach" : {
      "field" : "url.split",
      "processor" : {
        "drop" : {
          "if" : "field.x == null"
        }
      }
    }
  }

Here is the idea that I had to circumvent the problem. How to modify the foreach and the split to analyze that "project3/" is a directory and that "/demo.php" is the resource ?

I would do a split with "?" as separator to differentiate the path of the variables but I do not know how to treat the array

Before working with the elasticsearch processor, I use CSV plugin from logstash

-      csv {
-        source => "[apache2][access][url]"
-          separator => "/"
-          target => "[apache2][access][urlparsed2]"
-          skip_empty_columns => true 
-      }

It was not great but it was already ok

I hope I was clear

If we get to something functional I would do the PR on the git

Thank you in advance for your help
William

willouuu · July 22, 2019, 10:30am

Up please

@jsoriano maybe can you help me ?

jsoriano · July 22, 2019, 10:51am

Hey @willouuu,

Have you tried with the script processor? You should be able to do any filtering with it

willouuu · July 23, 2019, 7:30am

Thank you !
I had seen "painless" but I did not understand the power and how the exploited
I started writing my parsing but I still do not understand how to add this code in the pipeline. Can I put it other than in the source field? it's complicated to maintain like this, especially with the use of functions.

{
  "processors": [
    {
      "script": {
        "lang": "painless",
        "source": "xxxxxxxxxxxxxx"
      }
    },

I put my code if it can help as a base ( it's not finished)

// def url =  doc['url'].value;
def url = "/project3/submodule2/demo.php?id=12&name=toto#titi";

// Function extract last
def getExtensionFromFilename(filename) {   
 def returned_value = ""   
 m = (filename =~ /(\.[^\.]*)$/)   
 if (m.size()>0) returned_value = ((m[0][0].size()>0) ? m[0][0].substring(1).trim().toLowerCase() : "");   
 return returned_value 
}

// All type items
String[] path = url.split(/\//);
String[] query = url.split(/\?/); 
String[] fragment = url.split(/\#/); 
String[] filetype = url.split(/\./); 

// Explode PATH - limit to 4 items ( prevent many error )
List pathExplode = Arrays.asList(path); 
String file = pathExplode.last()
String p1 = pathExplode[1]; 
String p2 = pathExplode[2];
String p3 = pathExplode[3];
String p4 = pathExplode[5];


// Explode FILE
String[] ftBundle = file.split(/\?/); // split after "?"
List ftExplode = Arrays.asList(ftBundle); 
String ftsplit = ftExplode[0]; 
String ftt = getExtensionFromFilename(ftsplit);  // get php?id=12&name=toto#titi

// Explode QUERY
List queryExplode = Arrays.asList(query); 
String qr = queryExplode[1]; 

// Explode FRAGMENT
List fragmentExplode = Arrays.asList(fragment); 
String frag = fragmentExplode[1]; 



// Display all items
return "Path \n - "+p1+"\n - "+p2+"\n - "+p3+"\n - "+p4+"\nQuery \n - "+qr+"\nFragment \n - "+frag+"\nFilename \n - "+ftsplit+"\nFileType \n - "+ftt;

Result :
Path
- project3
- submodule2
- demo.php?id=12&name=toto#titi
- null
Query
- id=12&name=toto#titi
Fragment
- titi
Filename
- demo.php
FileType
- php

willouuu · August 19, 2019, 6:55am

Sorry to disturb you @jsoriano
how can I integrate this code into a pipeline please ?

Thanks

jsoriano · August 19, 2019, 12:19pm

@willouuu indeed including scripts in JSON pipelines can be tedious, this is why we added support to use YAML files for this . You can find some examples in the panw module.

willouuu · August 19, 2019, 2:29pm

Thanks you so much Jaime !
I'm watching this tomorrow

system · September 16, 2019, 4:29pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can nginx.access.url filed be separated to sub filed? Beats filebeat	1	282	February 21, 2019
How to Parse url.query into Different Fields? Beats filebeat	4	2245	October 29, 2019
Filebeat: Apache module : two paths and two access inputs to configure Beats filebeat	3	396	September 19, 2022
Fetching substring from a string in kibana Kibana	9	1925	August 28, 2023
URI Parts Processor decodes an encoded & in URL query Elasticsearch ingest-pipeline	1	663	January 18, 2022

Filebeat - Apache module - Access - URI detail / SEO

Related topics