How to model a document's state history?

Let's assume I have a document, which can have a state together with the timestamp when the document was set to this state. I want to know the document's current state as well as the document's state history.

Basically, I see 3 options how to model this:

option 1:

I have a nested state object with its name and timestamp.

put doc_idx
{
  "mappings":
  {
    "state_doc": {
      "properties": {
        "state": {
          "type": "nested",
          "properties": {
            "name": {
              "type": "keyword"
            },
            "date": {
              "type": "date"
            }
          }
        }
      }
    }
  }
}

This is probably the cleanest way to do it, as there is no redundant information. But I haven't found out yet how to filter documents by their current state, as it would be a nested query where only the the nested object with the maximum timestamp should match the state.

option 2:

I have a nested state object with its name and timestamp and a current flag.

put doc_idx
{
  "mappings":
  {
    "state_doc": {
      "properties": {
        "state": {
          "type": "nested",
          "properties": {
            "name": {
              "type": "keyword"
            },
            "date": {
              "type": "date"
            },
            "current": {
              "type": "boolean"
            }
          }
        }
      }
    }
  }
}

This would make it easier to filter documents by their current state. But it is prone to errors, as there could be less or more than 1 state having the current flag.

option 3:

I have a nested state_history object with its name and timestamp. Additionally, there is a current_state and a current_state_since field.

put doc_idx
{
  "mappings": {
    "state_doc": {
      "properties": {
        "current_state": {
          "type": "keyword"
        },
        "current_state_since": {
          "type": "date"
        },
        "state_history": {
          "type": "nested",
          "properties": {
            "name": {
              "type": "keyword"
            },
            "date": {
              "type": "date"
            }
          }
        }
      }
    }
  }
}

This version makes it very easy to filter documents by their current state. But setting a document's new state would require to update the current state and the state history every time.

Is there another, better option? If option 1 is the way to go: how would I filter documents by their current state?

Why not just make it an event stream, where changes to the state are recorded (with a timestamp) as a new event. Then you have the history and can get the latest easily.

I am new to ElasticSearch and I haven't heard of event streams yet. Could I ask you to put some code into your response so I can see what the data structure and the query should look like when applying your solution?

A relatively easy way to do this is to create two separate indices. The first one uses the document name/id as identifier and always contains the latest version. Whenever a new version arrived, this document is updated. The other index (or set of time-based indices) holds a copy of all versions of the document and can be used to analyse changes over time. New versions are directly indexed into this, each with separate IDs.

You can then select the appropriate index for querying depending on whether you are looking for current version or history.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.