How to find relations between independent documents within an index

So, I've been struggling for a while with a problem that looked simple but for which I just cannot find a solution. At this point I'm not sure if it's not possible to do this, or if I'm just unable to see it. Let's see if anyone can think of a way to solving this!

The thing is, I have an index with one entry (one document) per item sold in a particular store. So, for a given store, I have many independent entries. For instance:

Store: "John's veggies", Item: "carrots"
Store: "John's veggies", Item: "tomatoes"
Store: "John's veggies", Item: "eggplant"
Store: "Vegan World", Item: "carrots"
Store: "Vegan World", Item: "oranges"
Store: "Vegan World", Item: "squash"
... and so on.

The thing is, I need to answer questions such as:

How many stores do NOT sell eggplant? (in this case, a simple query such as "NOT Item: eggplant" would not work, because it would return for instance 'Store: "John's veggies", Item: "carrots"', which would add to the count and would not be correct.

How many stores sell carrots and oranges? (in this case, a simple query such as "Item: carrots AND Item: oranges" would not work, because "Item" cannot have two values and the result is zero documents. Another option such as "Item: carrots OR Item: oranges" would not work, because it would return stores that have either one or the other but would not match both conditions, being independent documents).

So, at the beginning this looked super simple but I just can't see a way to do it. What am I missing?

Thanks a lot for your help!

The short answer is that your data needs to be optimized for the queries you want to run against it, as Elasticsearch is not a standard relational database. However, you can store multiple values in a single field as an array, so if you stored data like { store: "John's veggies", items: ["carrots", "tomatoes"] } you could answer all the questions you mentioned:

  • NOT items: eggplant
  • items: carrots AND items: oranges

It does get a little tricky to differentiate between the count of documents and the number of items in this model, and what I usually do is created a runtime field which is emit(doc['items'].size()).

Since you've tagged this as a "Kibana" question, I will point out two workarounds: You can sample 10k documents in Vega or Canvas, and then apply filters that will be run in your web browser.

Thanks a lot for your quick reply!

So, the short answer is that I just cannot do this with the current index + Kibana alone, right?
I thought about the index modification, but unfortunately that's beyond my control at this point :frowning:

And any way to have something similar in a visualization, if not in a simple query?
So far I've managed to make a Data table, splitting by store name first, and then by item, but no way to group the items or count the stores. Someting like:

Store Item Count
============= =============== ==============
John's veggies Carrots 1
John's veggies Tomatoes 1
John's veggies Eggplant 1
Vegan World Carrots 1
......

But again, no relation between the different items, apart from being in the table one after the other as being in the same Store :frowning:

But I guess that if I cannot think of a query to do this, a visualization won't help either, right?

My last suggestion was for two possible ways to do this in Kibana, using either Vega OR Canvas. Both of these tools are limited to 10000 documents at a time due to Elasticsearch limits, but allow you to write calculations in your browser.

Hi Wylie!

Unfortunately I'm dealing with millions of registers... so I will have to got thought the new-index-creation path.

Thanks again for your support! Much appreciated!

I would for example use transform and pivot that index per shop and item... So to have aggregated data to give you answers per shop.

Totally agree, Yassine!
Requested to my devops just like that.

Thanks!