Aggregating Data for Visualization

talqadi7 · July 19, 2019, 4:12pm

Hi all, I'm trying visualize data based on an aggregation of a certain column. In this case, its company name. Throughout my data, company names are not standard, and I would like to find a way to either pre-aggregate the data, or I was wondering for the possibility of aggregating the data based on common terms in the company name field. For example, in the following image, there are many different ways Microsoft can be written.
ES%20company%20sample

This is just a sample that I made, however, in general the names can differ more, but will always have the company name, ex: Microsoft.
I did a bit of research, but couldn't find a similar application in the forums. Right now I broke it down to a couple of options:

Pre-aggregate the data with some python script
Pre-aggregate the data with logstash, however there is a lot of data, with many variations, so I dont know how applicable this is
If possible, a direct visualization with the most common terms throughout all the fields, and then filter out/ ignore the unneeded terms (like LLC, INC.)

Thank you so much for your help! I'm sorry if this is a little vague, or an open ended question. But I was just wondering if this has appeared before in the forums, or if there is even a solution going forward.

Bargs · July 19, 2019, 8:49pm

Is there a fixed number of company names you need to aggregate on? If so you could just use a filters agg for your buckets and craft a query that matches all of the names a particular company goes under. For your example, if company name is a text field using the standard analyzer, the KQL query would literally just be company name:microsoft

talqadi7 · July 20, 2019, 4:51pm

Thank you Bargs, this is good for a single use case, or for the bigger companies, so thank you for that.

For example, If i would like to create a visualization of how many company entries their were, I would need an automated way to catch all the company names. Is it possible to query out, for example, all names with 'microsoft', and then maybe add another column to all the caught entries with a simplified company name? Because in my actual dataset, there's hundreds of different company names.

Bargs · July 22, 2019, 10:47pm

Hmmm, I'm not sure I understand. Are you wanting to pick out common parts of the company names in an automated fashion without having a predefined list of canonical names?

talqadi7 · July 23, 2019, 4:48pm

Yes, that would be best case scenario. However, also picking out a predefined list of canonical names would also be completely applicable. The most important part of this is just being able to visualize all aggregations of a company as one. For instance, to allow us to see the most entries per company.
The KQL query earlier is a perfect example of what I'm looking for, but it only applies to that 1 company queried out. Do you know of a way to add the term used in the query as a column for each data entry (if this makes sense)?
Again, I know this is a less technical question, so thank you so much Bargs.

Bargs · July 24, 2019, 8:37pm

Sorry if I'm still misunderstanding, but if I had a predefined list of company names I would create a filters agg with a query matching all the variants for each of those company names. One filter per canonical company name. Something like this:

system · August 21, 2019, 8:37pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aggregate on a term and "join" it to different term Kibana	2	1000	July 6, 2017
Kibana need to combine results or change filter Kibana	2	472	July 6, 2017
Combine term and terms in visualization Kibana	4	5967	March 16, 2018
Kibana table combining aggregations Kibana	2	242	July 6, 2021
Terms aggregation by field, but shows another field in Kibana Kibana	3	476	October 11, 2019

Aggregating Data for Visualization

Related topics