Dynamic path indexing

hi guys,

in windows event logs there are quite often paths like
C:\Program Files (x86)\java\java\jre1.8_45\bin\java.exe

I wonder if it is possible to index every part of the path in separate fields like:
path2:Program Files (x86)

Using %{DATA:path1}\%{DATA:path2}\%{DATA:path3}.. in an grok filter would be quite an unefficient and unflexible way to do so given that every path have an random number of parts and you almost cant cover every possible case.

So what we actually need is something more flexible, something that counts the parts of the path and make an flexible amount of patterns for it.

unfortunatel I dont know where to start - can someone point me in the right direction?


Let's start with the problem definition—what are you trying to accomplish, i.e. what is the end that you want to reach by splitting path components into fields of their own?

thats an example on how an folderstructure could look at an company.

with this informations you can make aggregations like which user have accessed which document how many times about an given period of time

if you define fields for each part of the path you could make statistics like "give me the user that have accessed project x and location z in the last 2 days" or "which subproject have user x:accessed in the last 2 weeks" and you could verify if the access was ok or not suppossed.

Okay. To enable that kind of searches you don't have to split each path component into discrete fields. Just change how the original field is analyzed. I suggest you look into the path hierarchy tokenizer and the other tokenizers that can split input strings into terms.

thanks, gonna have a look at it

I´ll have to jump in again.

I tested using different tokenizers for the given fields, the path hiearchy seems to be the best one.
With using this, a string like Google(projet)\Glass(subproject)\USA(location)\Amazon(customer)\prizes.txt(document) get analyzed to

I guess this is an ES related question,but after thinking, trying and google for a few hours, i don´t see how i can do i.e. an top 10 list of the location - at least not in kibana4- without spliting the string in fields and make an top 10 list of the required field.

Any tipps regarding on this(should i move the question to the ES part)?

Aha, okay. Yes, in that case it sounds like you should have separate fields for the path components and the grok filter would be the typical way of extracting those fields from the full path. If the number of path components varies between different paths you can specify multiple grok expressions and have Logstash try them one by one until it gets a match.