ECS - Squid proxy log normalization

Hi there,

I really like to use elastic common schema for my data coming from a squid proxy, but I cannot really understand which ECS Field Reference should I use.

Observer seems good to describe the data about the proxy itself. However I cannot understand if it is considered best practice to use the pair Source-Destination or Client-Server for the endpoints of the communication. Should I also apply other Field Reference such as Log, Event, HTTP?

Is there any considered best practice specifically for Proxy Data?

Hi @herrBez,

Yes, typically you will use a combination of field sets together, when mapping your events to ECS. So using the source and destination pair is expected.

ECS doesn't really define anything for web or caching proxies per se, at this time, however. So you will likely have to add some custom (aka non-ECS) to your events to track additional information.

Adding sensible support for proxies is something that's been gnawing at me for a little bit, but we haven't had time to really look into it yet. Let's take the bull by the horns and get the discussion started, though.

Feel free to check out these past discussions for some ideas

You'll see that my suggestion in 158 and Mike's suggestion in 300 are conflicting :joy: So there's no definitive answer yet, until we actually sit down and add this to ECS proper. So track it in a way that makes sense to you in a custom field (maybe: haproxy.upstream.ip etc), and feel free to open aGitHub issue or a PR if you think you've figured out a schema that works well for proxies in general.

Here's a pointer for something you'll encounter, when mapping your upstream destinations.

ECS has a pattern of cramming whatever address format we get into .address and then pulling out the pieces (e.g. .ip, .port, .domain) depending on what we have in the address. This allows us to track all kinds of addresses: nginx/HAProxy can contain unix sockets in the "ip" field; httpd can contain hostnames in the "ip" field. So .address is guaranteed to always be filled, no matter the format. And if you know you specifically need the IP (e.g. doing geoip, ASN lookups), then you use the .ip field, which is filled most of the time.

I see an equivalent use case where if the upstream address you see in your logs looks like https://10.10.10.10:9200/some/path, you can put all of this into .address as is, then extract out the various interesting bits in a separate step.

Hope this helps. Let me know if you have other questions.

Mat

1 Like

Also make sure to check out event.* fields, to track some high level stuff, like event.duration for the overall duration of a transaction, for example

Thank you, for the very fast and the explanation. If I come up with a possible solution I will interact in the github repository.

Thank you.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.