Hello,
I'd like to import raw logs from an application
where I have time-based events reporting the change of items' state.
I'd like to get a transformation on data where each point in time reports the latest known state for an item
for example I have this data set
POST test/_doc
{
"item": 1,
"date" : "2020-03-10T14:12:12",
"status" : "A"
}
POST test/_doc
{
"item": 2,
"date" : "2020-03-10T14:12:12",
"status" : "A"
}
POST test/_doc
{
"item": 3,
"date" : "2020-03-10T14:12:12",
"status" : "A"
}
POST test/_doc
{
"item": 1,
"date" : "2020-03-11T14:12:12",
"status" : "B"
}
POST test/_doc
{
"item": 2,
"date" : "2020-03-11T14:12:12",
"status" : "B"
}
POST test/_doc
{
"item": 1,
"date" : "2020-03-12T14:12:12",
"status" : "C"
}
POST test/_doc
{
"item": 2,
"date" : "2020-03-12T14:12:12",
"status" : "C"
}
and what I want to get is
{
"item": 1,
"date" : "2020-03-10T14:12:12",
"status" : "A"
}
{
"item": 2,
"date" : "2020-03-10T14:12:12",
"status" : "A"
}
{
"item": 3,
"date" : "2020-03-10T14:12:12",
"status" : "A"
}
{
"item": 1,
"date" : "2020-03-11T14:12:12",
"status" : "B"
}
{
"item": 2,
"date" : "2020-03-11T14:12:12",
"status" : "B"
}
{
"item": 3,
"date" : "2020-03-11T14:12:12",
"status" : "A"
}
{
"item": 1,
"date" : "2020-03-12T14:12:12",
"status" : "C"
}
{
"item": 2,
"date" : "2020-03-12T14:12:12",
"status" : "C"
}
{
"item": 3,
"date" : "2020-03-12T14:12:12",
"status" : "A"
}
so for dates 2020-03-11 and 2020-03-12 the status of item 3 is taken from the latest known event, which happened on 2020-03-10
which is the best way to implement this?
Entity-centric index? Is there any simpler option?
Thanks