Visualise '(www.|)' terms agg in weblogs

I have weblogs for a bunch of sites, and the logs go into ES with either or (and sometimes, but not often, and so on)

I'm thinking about splitting this on the way in into and prefixes, so I can easily visualise using a terms agg for 'top 50 sites' in Kibana

Currently I can't figure out how to merge the totals for and in a Kibana vis - is there a way I can do this without reindexing?

The cleanest way to do this would probably be to fix the data, but in Kibana what I would probably do is create two separate visualizations, one that filters for and, and one that filters for the inverse (NOT AND NOT Then you could put them side by side in a dashboard to get a view across the entire data set.

How would that solve for 'top 50 sites' ? If I were going to make 50 visualisations, I'd be better off making 50 filters and using a single viz with a filter agg

It sounded to me like you essentially wanted a top 50 terms agg with the counts for and combined. I'm not sure how you'd achieve that with a single visualization, so my thinking was that you could create one visualization with the count for + and a second visualization with a top 50 terms agg on just the subdomains.

But perhaps I've misinterpreted your question. Could you provide a little more detail on what the data looks like? I assumed the domain was static and only the subdomains change, maybe that's incorrect?

Ah I see, that is more complicated. So for site3,,, and should all count towards in the terms agg, is that right?

Since you already seem to be using Groovy scripting, I assume you've tried creating a scripted field that strips the subdomain? Does that not work for some reason?

[quote="Bargs, post:6, topic:47251"]
Ah I see, that is more complicated. So for site3,,, and should all count towards in the terms agg, is that right?[/quote]


Cool idea - would you be able to give an example of how this would be done?

I imagine you could split the string on dots, remove the first element if there are greater than 2 array elements (in other words, there's a subdomain), and then rejoin with dots? Or maybe a regex would work, but I imagine that would be slower.

You might also be able to accomplish this with a value script in the advanced options of the terms agg itself instead of a scripted field: