Hello,
I want to do facetted search on events pushed by a device, facetted by device configuration:
I have devices objects with several configuration fields: these are the fields I need to query to filter devices.
Each device pushes metrics fields with a timestamp and the device ID.
What I ultimately need to do is query and aggregate the event metrics by device config: that is, to get all device IDs that match a filter query and aggregate metrics over time for all devices matching those IDs.
This seems like a pretty standard thing to do
Looking at the docs, I see 2 ways:
nested
fields: put the configuration fields in each event. Possible but expensive. Would insure an event matches a config even if it changes.- use the
_parent
type: where each event is a child of the config document
Option 2 seems more efficient, but I have questions and I see several potential problems:
The _parent
field documentation does not make much sense to me:
it says:
The
_parent.type
setting can only point to a type that doesn’t exist yet. This means that a type cannot become a parent type after it is has been created.
I would think you need to create the parent type, then create children referencing the parent.
A) How can a field point to a type that doesn't exist yet?
Maybe I'm confused about the meaning of creating a type.
This leads to another question:
B) If I were to use a _parent
field, can the mapping of the parent object change (i.e. add fields) ?
C) Also what happens if the parent object changes (i.e. is updated) ?
D) When querying with _has_parent
does it refer to the parent's latest state or does Elasticsearch internally denormalized the parent object in the child on creation?
I am also concerned about the requirements on _parent
:
the doc says:
Parent and child documents must be indexed on the same shard. The parent ID is used as the routing value for the child, to ensure that the child is indexed on the same shard as the parent. This means that the same parent value needs to be provided when getting, deleting, or updating a child document.
That seems pretty restrictive first for ingestion (how would I do this with Logstash for example) but also more problematic is the fact that there will be billions of children per parent, so how could they all be indexed on the same shard?
Is this used case just not really suited for Elasticsearch? Or is there a way to achieve this in better ways?
Thanks for your input.