Design question: index and mapping types for facetted search on billions of events

streamn · June 23, 2016, 9:00pm

Hello,

I want to do facetted search on events pushed by a device, facetted by device configuration:

I have devices objects with several configuration fields: these are the fields I need to query to filter devices.

Each device pushes metrics fields with a timestamp and the device ID.

What I ultimately need to do is query and aggregate the event metrics by device config: that is, to get all device IDs that match a filter query and aggregate metrics over time for all devices matching those IDs.

This seems like a pretty standard thing to do

Looking at the docs, I see 2 ways:

nested fields: put the configuration fields in each event. Possible but expensive. Would insure an event matches a config even if it changes.
use the _parent type: where each event is a child of the config document

Option 2 seems more efficient, but I have questions and I see several potential problems:

The _parent field documentation does not make much sense to me:

it says:

The _parent.type setting can only point to a type that doesn’t exist yet. This means that a type cannot become a parent type after it is has been created.

I would think you need to create the parent type, then create children referencing the parent.
A) How can a field point to a type that doesn't exist yet?
Maybe I'm confused about the meaning of creating a type.

This leads to another question:
B) If I were to use a _parent field, can the mapping of the parent object change (i.e. add fields) ?

C) Also what happens if the parent object changes (i.e. is updated) ?

D) When querying with _has_parent does it refer to the parent's latest state or does Elasticsearch internally denormalized the parent object in the child on creation?

I am also concerned about the requirements on _parent:
the doc says:

Parent and child documents must be indexed on the same shard. The parent ID is used as the routing value for the child, to ensure that the child is indexed on the same shard as the parent. This means that the same parent value needs to be provided when getting, deleting, or updating a child document.

That seems pretty restrictive first for ingestion (how would I do this with Logstash for example) but also more problematic is the fact that there will be billions of children per parent, so how could they all be indexed on the same shard?

Is this used case just not really suited for Elasticsearch? Or is there a way to achieve this in better ways?

Thanks for your input.

warkolm · June 25, 2016, 8:50pm

I'd just flatten everything rather than trying to add relationships, it just makes things complex.
Have the metric with all the applicable info in it that you would need to filter.

streamn · June 29, 2016, 12:33am

Seems like a lot of overhead (i.e. inserting 100's of extra keys in billions of docs) but I can see it makes life easier in the search side...
disk is cheap but I wonder if there is another way.

anyways... now how would I 'merge' my docs?
I basically have a stream that saves (updates) a doc in a config index. (1 doc per device with its config)
now I need to merge this existing doc with every incoming event...
Is this something that is feasible directly in ES? I was thinking pushing the latest doc to Redis as it comes in and pull from it on every event, but if ES has a way to do this, that may be better.

Any input on this?

Thanks

warkolm · June 29, 2016, 12:35am

You could have the device data/config in an index, then get the events and merge them in Logstash using https://www.elastic.co/guide/en/logstash/current/plugins-filters-elasticsearch.html, or potentially with a few https://www.elastic.co/guide/en/logstash/current/plugins-filters-translate.html

Topic		Replies	Views
Design question - relationships across indices Elasticsearch	1	317	July 6, 2017
Document design for querying related documents efficiently Elasticsearch	1	305	July 6, 2017
Help with grouping facet Elasticsearch	1	266	July 6, 2017
Facets on children Elasticsearch	2	273	July 6, 2017
Mapping design with large number of child objects Elasticsearch	2	2003	July 5, 2017

Design question: index and mapping types for facetted search on billions of events

Related topics