Insert and update relational data with denormalization

Hello,

As a new ES user, I'm facing some questions about data structure, denormalization and what comes with it :

Considering a simple 1:n data structure, like Device -> Record, where a Device produces a high number of Record (1M).

Trying to denormalize the data (and considering the removal of "types"), I imagined the following 2 indexes mapping on ES :

Index Device mapping :

{
  "mapping": {
    "_doc": {
      "properties": {
        "deviceId": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "location": {
          "type": "geo_point"
        },
        "name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "upStatus": {
          "type": "boolean"
        }
      }
    }
  }
}

Index Record mapping :

{
  "mapping": {
    "record": {
      "properties": {
        "timestamp": {
          "type": "date"
        },
        "type": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "device": { 
            "properties": {
              "deviceId": {
                "type": "text"
              },
              "location": {
                "type": "geo_point"
              },
              "name": {
                "type": "text"
              },
              "upStatus": {
                "type": "boolean"
              }
            }
         }
      }
    }
  }
}

I'm facing two problems with this :

  1. A client that wants to index records have to send all the Device properties along with the Record
    -> I expected there was a way to solve this using the ingest node but I could not find a Processor that can fetch the necessary Device properties to add them to the Record.
  2. Data integrity : each time a property on Device changes, all corresponding Records (> 1M) needs to be updated.

Do you have any pointers on those 2 issues or maybe a better data structure, etc… ?

Thank you very much for your inputs !

  1. You're right - there's no such processor - yet. Work is underway on a decorate processor that will allow you to do this with an ingest pipeline. You can follow that work on this Github issue. For now, Logstash is probably the better tool to achieve this.

  2. Yeah, updating all the records could be done using an update by query, but it's very expensive. Another solution would be to index parent-child documents using the join datatype. This would allow you to change the devices without having to update the records. However, be aware that joining the devices and their records at query time, with has_parent and has_child queries will be much slower than querying your current flat denormalized documents. Everything comes at a price.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.