Visualise '(www.|)site.com' terms agg in weblogs

tomr · April 13, 2016, 1:18pm

I have weblogs for a bunch of sites, and the logs go into ES with either site.com or www.site.com (and sometimes, but not often, a.site.com b.site.com and so on)

I'm thinking about splitting this on the way in into site.com and prefixes, so I can easily visualise using a terms agg for 'top 50 sites' in Kibana

Currently I can't figure out how to merge the totals for www.site.com and site.com in a Kibana vis - is there a way I can do this without reindexing?

Bargs · April 13, 2016, 10:10pm

The cleanest way to do this would probably be to fix the data, but in Kibana what I would probably do is create two separate visualizations, one that filters for site.com and www.site.com, and one that filters for the inverse (NOT site.com AND NOT www.site.com). Then you could put them side by side in a dashboard to get a view across the entire data set.

tomr · April 13, 2016, 11:53pm

How would that solve for 'top 50 sites' ? If I were going to make 50 visualisations, I'd be better off making 50 filters and using a single viz with a filter agg

Bargs · April 14, 2016, 3:26pm

It sounded to me like you essentially wanted a top 50 terms agg with the counts for site.com and www.site.com combined. I'm not sure how you'd achieve that with a single visualization, so my thinking was that you could create one visualization with the count for site.com + www.site.com and a second visualization with a top 50 terms agg on just the subdomains.

But perhaps I've misinterpreted your question. Could you provide a little more detail on what the data looks like? I assumed the domain was static and only the subdomains change, maybe that's incorrect?

tomr · April 14, 2016, 3:54pm

site1.com
www.site1.com
site2.com

site3.com
www.site3.com
weird.site3.com
site4.com
www.site4.com
etc

Bargs · April 14, 2016, 4:32pm

Ah I see, that is more complicated. So for site3, www.site3.com, site3.com, and weird.site3.com should all count towards site3.com in the terms agg, is that right?

Since you already seem to be using Groovy scripting, I assume you've tried creating a scripted field that strips the subdomain? Does that not work for some reason?

tomr · July 20, 2016, 2:51am

[quote="Bargs, post:6, topic:47251"]
Ah I see, that is more complicated. So for site3, www.site3.com, site3.com, and weird.site3.com should all count towards site3.com in the terms agg, is that right?[/quote]

Exactly

Cool idea - would you be able to give an example of how this would be done?

Bargs · July 20, 2016, 1:36pm

I imagine you could split the string on dots, remove the first element if there are greater than 2 array elements (in other words, there's a subdomain), and then rejoin with dots? Or maybe a regex would work, but I imagine that would be slower.

You might also be able to accomplish this with a value script in the advanced options of the terms agg itself instead of a scripted field: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_value_script_8

Topic		Replies	Views
Kibana 5, terms and aggregations Kibana	5	1649	January 2, 2017
Combine term and terms in visualization Kibana	4	5976	March 16, 2018
Use of regular expression in the Kibana Visualization Kibana	2	6706	December 10, 2019
Aggregation and sub-aggregation using an array of strings Kibana	2	1216	July 6, 2017
Creating a visualisation that Aggregates (combines?) 2 discrete fields in an index Kibana	2	361	November 18, 2019

Visualise '(www.|)site.com' terms agg in weblogs

Related topics