Top terms in comma separated list


(Stéphane Nicoll) #1

Hey,

My index define an attribute "foo" that contains a list of well defined terms (for instance foo, bar, biz, etc). Each document has zero, one or more of those terms.

I'd like to better understand how users are using my service and one way to achieve that would be a graph in Kibana that'd show the top 10 terms combination for that field.

What is the easiest way to achieve this. I already built a graph on "foo" but that only shows me how frequent a term is used, not how frequent the combination of terms are.

Thanks!


Mapping to analyze a list with a fixed amount of terms
(CJ Cenizal) #2

Hey Stéphane, can you help me understand how your data is structured a bit better? Are you saying that each document has a property called "foo", and the value of that property is an array of values, which could consist of "foo", "bar", "biz", etc?

"foo": ["foo", "bar", "biz"]

And are you saying you want to see how many users have "foo": ["foo", "bar"] vs "foo": ["bar", "biz"]?

Thanks,
CJ


(Stéphane Nicoll) #3

Hey,

Thanks a lot for the reply!

So in my document, the attribute foo is mapped as a string and when we ingest data we pass a json array of terms. There is a fixed amount of terms (roughly 100 different values). What I am interested is patterns in which those terms are used together. So if the user has selected foo, bar or bar, foo that's essentially the same thing (selecting those two values).

What I'd like to do indeed is to get a report/visualization of the top "combinations" of those terms (your example above is spot on).

Extra request: if I type dependencies: foo in the searcher, then I'd like those top combinations to focus on those where foo is present .

Thanks!


(CJ Cenizal) #4

Hi Stéphane, thanks for the explanation! I tried to recreate your use case and here's what I came up with. First, I created a bunch of documents with the links.raw property, which is an array of various links (e.g. "twitter.com", "joe@gmail.com"). Then I created a bar chart visualization with this kind of configuration:

As you can see, this gets some good-looking results BUT they're incorrect! Our query is looking at the occurrence of each individual item within the links.raw array, instead of treating the array as a single unit.

In order to get the kind of visualization you want, I had to create a scripted field (Management > Index Patterns > Scripted Fields). I named it formattedLinks and configured it like this:

This concatenates the values in the array. Then I changed my visualization to aggregate on this scripted field instead of links.raw. Now I got a visualization more inline with what you're looking for. The only challenge is if the order of your values isn't deterministic (e.g. you see arrays of both ["foo", "bar"] and ["bar", "foo"], then you will need to sort them, too. If you're doing that then I recommend looking at something like Logstash to format this data at ingest time (instead of at query time), which will be much more efficient and faster.

Does this help?

CJ


(Stéphane Nicoll) #5

Thanks a lot, that answers my question!

Cheers


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.