Hi all, I'm trying visualize data based on an aggregation of a certain column. In this case, its company name. Throughout my data, company names are not standard, and I would like to find a way to either pre-aggregate the data, or I was wondering for the possibility of aggregating the data based on common terms in the company name field. For example, in the following image, there are many different ways Microsoft can be written.
This is just a sample that I made, however, in general the names can differ more, but will always have the company name, ex: Microsoft.
I did a bit of research, but couldn't find a similar application in the forums. Right now I broke it down to a couple of options:
- Pre-aggregate the data with some python script
- Pre-aggregate the data with logstash, however there is a lot of data, with many variations, so I dont know how applicable this is
- If possible, a direct visualization with the most common terms throughout all the fields, and then filter out/ ignore the unneeded terms (like LLC, INC.)
Thank you so much for your help! I'm sorry if this is a little vague, or an open ended question. But I was just wondering if this has appeared before in the forums, or if there is even a solution going forward.