ECS comment: make sure to parse out the extension, when there's one
Non-ECS comment: I've taken down a cluster once, parsing out all query params of a busy web application Make sure your custom field url.queryparams uses a datatype that's not going to cause a mapping explosion, like flattened
ECS Nitpick: The best match for IPORHOST in ECS is .address, which is specified as the address "when you're not sure yet if it's an IP, a domain or a unix socket". So for URLs, you'd fill .address, then copy out to .ip if it's an IP, and otherwise copy to .domain.
This way you end up with .address which is filled reliably, 100% of the time.
If you need to do IP-specific analysis, you have .ip which is the ip datatype, and lets you do CIDR lookups, for example. Or just looking for exists:url.ip may surface interesting weird stuff.
If you need to do analysis on domain names, you have .domain and all of the domain breakdown fields which won't contain IP addresses.
This actually brings me to the domain breakdown fields. I don't recall if we have a solid way to break them down by effective TLD in Logstash . But if the data source analyses many domains (as opposed to incoming web traffic on your webserver), it would be interesting to fill the domain breakdown fields as well. So "www.example.co.uk" becomes:
.top_level_domain:co.uk => to analyze broad traffic destinations
Thank you very much @webmat
I followed the .address advice and did the TLD parsing (with the custom plugin) and updated the configuration up there accordingly.
I'm still unsure how to address your second point on query params. Once kv filter done, should I convert into json and then use an es template to force the flattened type?
After the kv filter is done, you should have multiple keys nested under [url][queryparams].
The only thing you would need to do, in order to avoid the mapping explosion, is modify your index template so that the field url.queryparams itself is of type flattened.
So assuming you're using the sample ECS template we provide here, you could add your custom field right below the definition for the query field, like this:
"query": {
"ignore_above": 1024,
"type": "keyword"
},
"queryparams": {
"type": "flattened",
// other params for the flattened type?
},
Another option would indeed be to do as you describe. Turn the resulting structure into a big string where perhaps the text datatype could help dig in there.
But I would definitely give flattened a try first. It behaves somewhat like a bunch of keyword fields, but also avoids the mapping explosion.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.