ECS - Squid proxy log normalization

herrBez · July 16, 2019, 3:13pm

Hi there,

I really like to use elastic common schema for my data coming from a squid proxy, but I cannot really understand which ECS Field Reference should I use.

Observer seems good to describe the data about the proxy itself. However I cannot understand if it is considered best practice to use the pair Source-Destination or Client-Server for the endpoints of the communication. Should I also apply other Field Reference such as Log, Event, HTTP?

Is there any considered best practice specifically for Proxy Data?

webmat · July 16, 2019, 3:59pm

Hi @herrBez,

Yes, typically you will use a combination of field sets together, when mapping your events to ECS. So using the source and destination pair is expected.

ECS doesn't really define anything for web or caching proxies per se, at this time, however. So you will likely have to add some custom (aka non-ECS) to your events to track additional information.

Adding sensible support for proxies is something that's been gnawing at me for a little bit, but we haven't had time to really look into it yet. Let's take the bull by the horns and get the discussion started, though.

Feel free to check out these past discussions for some ideas

You'll see that my suggestion in 158 and Mike's suggestion in 300 are conflicting So there's no definitive answer yet, until we actually sit down and add this to ECS proper. So track it in a way that makes sense to you in a custom field (maybe: haproxy.upstream.ip etc), and feel free to open aGitHub issue or a PR if you think you've figured out a schema that works well for proxies in general.

Here's a pointer for something you'll encounter, when mapping your upstream destinations.

ECS has a pattern of cramming whatever address format we get into .address and then pulling out the pieces (e.g. .ip, .port, .domain) depending on what we have in the address. This allows us to track all kinds of addresses: nginx/HAProxy can contain unix sockets in the "ip" field; httpd can contain hostnames in the "ip" field. So .address is guaranteed to always be filled, no matter the format. And if you know you specifically need the IP (e.g. doing geoip, ASN lookups), then you use the .ip field, which is filled most of the time.

I see an equivalent use case where if the upstream address you see in your logs looks like https://10.10.10.10:9200/some/path, you can put all of this into .address as is, then extract out the various interesting bits in a separate step.

Hope this helps. Let me know if you have other questions.

Mat

webmat · July 16, 2019, 4:00pm

Also make sure to check out event.* fields, to track some high level stuff, like event.duration for the overall duration of a transaction, for example

herrBez · July 17, 2019, 8:48am

Thank you, for the very fast and the explanation. If I come up with a possible solution I will interact in the github repository.

Thank you.

system · August 14, 2019, 8:48am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Migrating pipelines to ECS Elasticsearch ecs-elastic-common-schema	3	523	August 2, 2019
Elastic Common Schema Fields for Reverse Proxies Logs ecs-elastic-common-schema	3	1516	April 12, 2021
Squid proxy logs directly to ELasticSearch via filebeat! Beats ecs-elastic-common-schema , filebeat	6	3453	October 4, 2019
Elastic Common Schema for IPS Logs Elasticsearch ecs-elastic-common-schema	4	1586	August 23, 2019
ECS mappings for asset tracking and port scanning output Elasticsearch ecs-elastic-common-schema	2	448	December 21, 2020

ECS - Squid proxy log normalization

Related topics