Full outer join type queries - comparing two data sets

JamesCrowley · March 25, 2016, 1:35pm

hey folks,

I'm trying to figure the best way to structure/query our data in ES, and would really appreciate some pearls of wisdom! We have documents at the moment that look like this:

Inventory { InventorySnapshotId: "2010-02-01", InventoryId: "ProductA", Quantity: 100, ManufacturerName: "John" }
Inventory { InventorySnapshotId: "2010-02-02", InventoryId: "ProductA", Quantity: 90, ManufacturerName: "Johnny" }
Inventory { InventorySnapshotId: "2010-02-01", InventoryId: "ProductB", Quantity: 50, ManufacturerName: "Sarah" }
Inventory { InventorySnapshotId: "2010-02-02", InventoryId: "ProductC", Quantity: 40, ManufacturerName: "Sally" }

Whenever we query this data, it's always relative to another baseline InventorySnapshotId. For instance, querying "2010-02-02" against a baseline of "2010-02-01"

 InventoryId | Quantity | BaselineQuantity | ManufacturerName | BaselineManufacturerName
 ProductA    | 90       | 100                | Johnny           | John
 ProductB    | 0        | 50                 |                  | Sarah   <- sold out inventory
 ProductC    | 40       |                    | Sally            |         <- new inventory

What's the best way to express/store this? In SQL semantics we'd use a full outer join between the current and the baseline rows... I'm thinking we might have to group in elastic search and then do some magic in the consumer, but.... any thoughts?

In case it's helpful,

Data summary:

There could be a few million inventory items per InventorySnapshotId, and several thousand InventorySnapshots a day
Once a snapshot is stored, it never changes
There are around 80 different properties for each inventory item, defined by an external system - we have meta data about the fields, but they can be added/removed over time, some of which we'd search on, some of which we'd run aggregate queries over... hence why I'm thinking ElasticSearch is a good fit.

Query summary:

page though the data (on a specific date, against a baseline), and some other criteria (such as ManufacturerName matching either today or the comparison)
There would also be aggregate queries and faceted searches over these data sets (ie group on current manufacturer name, and give me totals for the current quantity and baseline quantity)
There's also a hierarchy element to this data, but I think that's one for another day!

Any suggestions or approaches I should be looking at? Thanks!

Topic		Replies	Views
Strategy for "JOINING" data Elasticsearch	2	353	May 31, 2019
How to model data for time-based joins? Elasticsearch	2	471	February 28, 2018
Produce results basend on a comparison between buckets of an aggregation Elasticsearch	1	406	August 1, 2018
Matching two fields with the same value across two indices Elasticsearch	8	4329	December 29, 2017
Looking Up Data in Another Table by Key Elasticsearch	1	363	July 3, 2018

Full outer join type queries - comparing two data sets

Related topics