How to model a document's state history?

jamminbean · August 16, 2018, 10:22am

Let's assume I have a document, which can have a state together with the timestamp when the document was set to this state. I want to know the document's current state as well as the document's state history.

Basically, I see 3 options how to model this:

option 1:

I have a nested state object with its name and timestamp.

put doc_idx
{
  "mappings":
  {
    "state_doc": {
      "properties": {
        "state": {
          "type": "nested",
          "properties": {
            "name": {
              "type": "keyword"
            },
            "date": {
              "type": "date"
            }
          }
        }
      }
    }
  }
}

This is probably the cleanest way to do it, as there is no redundant information. But I haven't found out yet how to filter documents by their current state, as it would be a nested query where only the the nested object with the maximum timestamp should match the state.

option 2:

I have a nested state object with its name and timestamp and a current flag.

put doc_idx
{
  "mappings":
  {
    "state_doc": {
      "properties": {
        "state": {
          "type": "nested",
          "properties": {
            "name": {
              "type": "keyword"
            },
            "date": {
              "type": "date"
            },
            "current": {
              "type": "boolean"
            }
          }
        }
      }
    }
  }
}

This would make it easier to filter documents by their current state. But it is prone to errors, as there could be less or more than 1 state having the current flag.

option 3:

I have a nested state_history object with its name and timestamp. Additionally, there is a current_state and a current_state_since field.

put doc_idx
{
  "mappings": {
    "state_doc": {
      "properties": {
        "current_state": {
          "type": "keyword"
        },
        "current_state_since": {
          "type": "date"
        },
        "state_history": {
          "type": "nested",
          "properties": {
            "name": {
              "type": "keyword"
            },
            "date": {
              "type": "date"
            }
          }
        }
      }
    }
  }
}

This version makes it very easy to filter documents by their current state. But setting a document's new state would require to update the current state and the state history every time.

Is there another, better option? If option 1 is the way to go: how would I filter documents by their current state?

warkolm · August 16, 2018, 11:30pm

Why not just make it an event stream, where changes to the state are recorded (with a timestamp) as a new event. Then you have the history and can get the latest easily.

jamminbean · August 17, 2018, 7:19am

I am new to ElasticSearch and I haven't heard of event streams yet. Could I ask you to put some code into your response so I can see what the data structure and the query should look like when applying your solution?

Christian_Dahlqvist · August 17, 2018, 7:28am

A relatively easy way to do this is to create two separate indices. The first one uses the document name/id as identifier and always contains the latest version. Whenever a new version arrived, this document is updated. The other index (or set of time-based indices) holds a copy of all versions of the document and can be used to analyse changes over time. New versions are directly indexed into this, each with separate IDs.

You can then select the appropriate index for querying depending on whether you are looking for current version or history.

system · September 14, 2018, 7:28am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Persisting latest state in separated index Elasticsearch	2	273	December 19, 2021
Continuous document? Elasticsearch	7	1050	July 5, 2017
History of updates modeling Elasticsearch	2	116	June 3, 2024
Structure for object history tracking Elasticsearch	2	705	August 30, 2018
Update documents Elasticsearch	5	408	August 14, 2018

How to model a document's state history?

Related topics