Design question - relationships across indices

Ludwig_Magnusson · February 12, 2014, 10:33am

In my application I want to index user profiles and events. Each event is
performed by a specific user at a specific time. I would like to be able to
look at a specific event and see statistics on the users that have
preformed them. To make this possible I have done some initial experiments
with parent/child relationships. I index user documents and give each
document searchable attributes such as age. I then map the events to have
the user as parent, referencing the id. This has proved to work very well
for my querying requirements. I can extract the statistics I want in very
flexible ways and I can also update information about users if I need to.
However, the problem is that in a parent/child relationship the parent and
child documents needs to be in the same shard, which in this case seems a
bit problematic when it comes to scaling.

I expect to receive many events and therefore I would like to place them in
different indexes based on time, lets say one index per day, week or month
to be able to easily scale up to different servers when the need arises and
to be able to archive old data by removing old indices. This however seems
to remove the possibility of having the parent/child relationship. Since
the user data is not time based, and since users would reference events in
different time indices it would need to be stored in its own index.

To sum up, the basic requirements are:

Being able to query the data with great flexibility
Scalability
Being able to archive old data

The solutions that come to mind are:

Keep the model explained above but do not use the parent/child
mapping and put users and events in different indices. Do one query to
fetch all the users, and then a separate query to get all events that has
the fetched user ids. However, this does not seem to be very efficient (my
guess) and it could potentially send a lot of information across nodes in
the network since one query could match perhaps 50 000 different users.
Wait for this pull requesthttps://github.com/elasticsearch/elasticsearch/pull/3278 to
be merged and use that feature. But would that not be the same thing as
solution 1 in practice?
Model the data in a different way. If there is a better way to do it,
how would it be modeled?

Thanks in advance for any advice and/or feedback
/Ludwig

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/98c763c7-f64e-49ca-9afb-c3b6b6efb017%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Storing user based events in elasticsearch Elasticsearch	1	659	April 18, 2019
How to INNER JOIN documents from different indexes Elasticsearch	7	6294	August 17, 2017
Parent/Child and index allocation problem Elasticsearch	3	358	July 6, 2017
Parent - child in separate indices Elasticsearch	3	836	July 5, 2017
Best practices for related and hierarchical data Elasticsearch	4	15958	July 5, 2017

Design question - relationships across indices

Related topics