Dynamic path indexing

german23 · June 23, 2015, 3:22pm

hi guys,

in windows event logs there are quite often paths like
C:\Program Files (x86)\java\java\jre1.8_45\bin\java.exe

I wonder if it is possible to index every part of the path in separate fields like:
path1:C:
path2:Program Files (x86)
path3:java
....

Using %{DATA:path1}\%{DATA:path2}\%{DATA:path3}.. in an grok filter would be quite an unefficient and unflexible way to do so given that every path have an random number of parts and you almost cant cover every possible case.

So what we actually need is something more flexible, something that counts the parts of the path and make an flexible amount of patterns for it.

unfortunatel I dont know where to start - can someone point me in the right direction?

thanks

magnusbaeck · June 23, 2015, 7:48pm

Let's start with the problem definition—what are you trying to accomplish, i.e. what is the end that you want to reach by splitting path components into fields of their own?

german23 · June 23, 2015, 8:18pm

Project\subproject\location\customer\document
thats an example on how an folderstructure could look at an company.

with this informations you can make aggregations like which user have accessed which document how many times about an given period of time

if you define fields for each part of the path you could make statistics like "give me the user that have accessed project x and location z in the last 2 days" or "which subproject have user x:accessed in the last 2 weeks" and you could verify if the access was ok or not suppossed.
.

magnusbaeck · June 24, 2015, 5:53am

Okay. To enable that kind of searches you don't have to split each path component into discrete fields. Just change how the original field is analyzed. I suggest you look into the path hierarchy tokenizer and the other tokenizers that can split input strings into terms.

german23 · June 24, 2015, 8:15am

thanks, gonna have a look at it

german23 · June 25, 2015, 9:31am

I´ll have to jump in again.

I tested using different tokenizers for the given fields, the path hiearchy seems to be the best one.
With using this, a string like Google(projet)\Glass(subproject)\USA(location)\Amazon(customer)\prizes.txt(document) get analyzed to
Google
Google\Glass
Google\Glass\USA
Google\Glass\USA\Amazon
Google\Glass\USA\Amazon\prizes.txt

I guess this is an ES related question,but after thinking, trying and google for a few hours, i don´t see how i can do i.e. an top 10 list of the location - at least not in kibana4- without spliting the string in fields and make an top 10 list of the required field.

Any tipps regarding on this(should i move the question to the ES part)?

magnusbaeck · June 25, 2015, 9:50am

Aha, okay. Yes, in that case it sounds like you should have separate fields for the path components and the grok filter would be the typical way of extracting those fields from the full path. If the number of path components varies between different paths you can specify multiple grok expressions and have Logstash try them one by one until it gets a match.