Top terms in comma separated list

snicoll · July 20, 2017, 1:44pm

Hey,

My index define an attribute "foo" that contains a list of well defined terms (for instance foo, bar, biz, etc). Each document has zero, one or more of those terms.

I'd like to better understand how users are using my service and one way to achieve that would be a graph in Kibana that'd show the top 10 terms combination for that field.

What is the easiest way to achieve this. I already built a graph on "foo" but that only shows me how frequent a term is used, not how frequent the combination of terms are.

Thanks!

cjcenizal · July 21, 2017, 5:39am

Hey Stéphane, can you help me understand how your data is structured a bit better? Are you saying that each document has a property called "foo", and the value of that property is an array of values, which could consist of "foo", "bar", "biz", etc?

"foo": ["foo", "bar", "biz"]

And are you saying you want to see how many users have "foo": ["foo", "bar"] vs "foo": ["bar", "biz"]?

Thanks,
CJ

snicoll · July 21, 2017, 3:15pm

Hey,

Thanks a lot for the reply!

So in my document, the attribute foo is mapped as a string and when we ingest data we pass a json array of terms. There is a fixed amount of terms (roughly 100 different values). What I am interested is patterns in which those terms are used together. So if the user has selected foo, bar or bar, foo that's essentially the same thing (selecting those two values).

What I'd like to do indeed is to get a report/visualization of the top "combinations" of those terms (your example above is spot on).

Extra request: if I type dependencies: foo in the searcher, then I'd like those top combinations to focus on those where foo is present .

Thanks!

cjcenizal · July 21, 2017, 5:37pm

Hi Stéphane, thanks for the explanation! I tried to recreate your use case and here's what I came up with. First, I created a bunch of documents with the links.raw property, which is an array of various links (e.g. "twitter.com", "joe@gmail.com"). Then I created a bar chart visualization with this kind of configuration:

As you can see, this gets some good-looking results BUT they're incorrect! Our query is looking at the occurrence of each individual item within the links.raw array, instead of treating the array as a single unit.

In order to get the kind of visualization you want, I had to create a scripted field (Management > Index Patterns > Scripted Fields). I named it formattedLinks and configured it like this:

This concatenates the values in the array. Then I changed my visualization to aggregate on this scripted field instead of links.raw. Now I got a visualization more inline with what you're looking for. The only challenge is if the order of your values isn't deterministic (e.g. you see arrays of both ["foo", "bar"] and ["bar", "foo"], then you will need to sort them, too. If you're doing that then I recommend looking at something like Logstash to format this data at ingest time (instead of at query time), which will be much more efficient and faster.

Does this help?

CJ

snicoll · July 23, 2017, 12:57pm

Thanks a lot, that answers my question!

Cheers

system · August 20, 2017, 12:57pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is it possible to use comma-separated-list-data as terms in Kibana? Logstash	6	7597	August 9, 2017
Mapping to analyze a list with a fixed amount of terms Kibana	4	589	August 21, 2017
Top 10 Filtering Appears inconsistent Kibana	7	329	April 17, 2019
Getting Top 5 values from an Array Kibana	3	1086	November 26, 2018
Visualizing Terms In A Field Kibana	4	353	February 7, 2020

Top terms in comma separated list

Related topics