I'm working on a use case that I think Elasticsearch 5 would be great for but I'm still a little unsure on a couple of points.
I want to start telemetry records using Elasticsearch for search and aggregation operations.
Assumptions: 1) No record ever becomes "less valuable" than another. We won't be retiring any of this data, just adding to it.
- The data is time-series data. I have been thinking of it like data frames. So, it's a collection of values associated to time. Here's a sample of one frame. An average set of these is easily 50,000. It's not uncommon in my use case to need all 50,000 frames at once, or at least a few values from them. (For example, if I want the path of the vehicle, I want all 50,000 lats and longs)
Questions:
Do I want to store each "frame" as it's own document? So for example:
"body" : [
telemetry_set_id: 200,
frame: {
datetime: 2017-04-13 00:26:35,
lat: "-37.0000",
long: "127.000000",
speed: "29.3",
battery: "97%"
}
]
So this would mean to get the whole set of data I'd probably have to do a scroll operation for all documents that have telemetry_set_id:200.
It's an important use case for me to be able to make queries like, "Show me the highest speed for telemetry_set 200" or "Show me all sets of telemetry data that include a coordinate within 5 miles of this X,Y coordinate. "
The other option is to make monstrous documents like this that contain an array of frame objects:
(Frames would be mapped to nested in this case)
"id" : "telemetry_set_200",
"body" : [
frames:[
{
datetime: 2017-04-13 00:26:35,
lat: "-37.0000",
long: "127.000000",
speed: "29.3",
battery: "97%"
},
{
datetime: 2017-04-13 00:26:36,
lat: "-37.0030",
long: "127.000010",
speed: "30.1",
battery: "96.5%"
},
{
datetime: 2017-04-13 00:26:37,
lat: "-37.0040",
long: "127.000020",
speed: "30.3",
battery: "96%"
}
]
]
The advantage here is that I can get the entire set of telemetry all at once which I feel is helpful. Will this still permit me to say, "Show me sets of telemetry data that include a lat/long within 5 miles of this specific lat and long?"
I could setup has_parent as well. Is this more of a has_parent problem or a "nested" data type problem? I think has_parent is probably the answer here but I wouldn't mind hearing from some more seasoned professionals.
Thank you in advance, I'm excited to begin working with this new tool.
Josh