How best to Denormalize a SQL schema

Hi all,

I'm building a new Index for a use case and I'm wondering what would be the best mapping to structure this index.

I have no problem building this with SQL tables, links and joins; but I struggle finding the good way to denormalize the SQL schema into a good Elasticsearch mapping.

Here is a simplified example. Let's say I have three SQL tables:

Experiment Table:
- experiment_ID (PK)
- experiement_name
- start_date
- lead_scientist

Plant Table:
- plant_id (PK)
- species

Follow Up Table:
- follow_up_id (PK)
- experiment_id (FK)
- plant_id (FK)
- date
- biomass
- temperature

The idea is to use the follow up table to track daily observation of the experiment

The denormalized document for ES could be:

{
  "follow_up_id": "f1",
  "date": "2023-10-11T04:06:07",
  "bioMass": 4.6,
  "temperature": 23.6,
  "experiment": {
    "experiment_id": "e1",
    "experiment_name": "...",
    "start_date": "2023-10-01T00:00:00",
    "lead_scientist": "John Doe"
  },
  "plant": {
    "plant_id": "...",
    "species": "..."
  }
}

This works fine for most use cases (List experiment, average data over an experiment, list all follow up measures for an experiment) but there is a few issue with some use case:

1. Create a new experiment:
When scientist want to start a new experiment they would like to create the experiment before there is any follow up data (basically you open the web interface, click on "create new experiment", fulfil experiment details and only the day after they do the first follow up)
I can easily do that in SQL, create a new line in the "Experiment Table" that will be referenced later in the Follow Up Table.
How can I do that with Elasticsearch ? With the json document I propose, I need to create a "fake" follow up entry to create the experiment which will then impair the result if I do averages of data over the experiment as the count of "follow up" will yield +1

2. Update Experiment Data at the end of the experiment (not required but I'm anticipating costumer request here ^^)
For instance if the scientist wants to add overall observation or experiment outcomes, I will need to update all "follow up" document of this experiment which doesn't seems very good

Is there a better way to denormalized my data to work with those 2 use cases ?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.