We are trying to find hostnames in the DHCP logs, that are roque in the sense that the hostname is not following current naming standards.
So I have been trying to figure out, how to do that with ML.
So I came up with an idea to collapse the hostname into patterns. So I preprocess all the names like this:
change all upper case chars to 'A',
change all lowercase to 'a'
change all digits to '0'
all special char to '.'.
This gives a uniform pattern of the names.
I then used the rare function in a ML job to run through all the DHCP leases and I actually got some pretty decent results with this approach.
But is the correct way of detecting stuff like this or is there a simpler way , where I dont have to preprocess all the names myself ?