Index design for user's activities

I am maintaining a years of user's activity including browse, purchase
data. Each entry in browse/purchase is a json object:{item_id: id1,
item_name, name1, category: c1, brand:b1, event_time: t1} .

I would like to compose different queries such like getting all customers
who browsed item A, and or purchased item B within time range t1 to t2.
There are tens of millions customers.

My current design is to use nested object for each customer:
customer1:
customer_id,id1,
name: name1,
country: US,
browse: [{browseentry1_json},{browseentry2_json},...],
purchase: [{purchase entry1_json},{purchase entry2_json},...]

With this design, I can easily compose all kinds of queries with nested
query. The only problem is that it is hard to expire older browse/purchase
data: I only wanna keep, for example, a years of browse/purchase data. In
this design, I will have to at some point, read the entire index out,
delete the expired browse/purchase data, and write them back.

Another design is to use parent/child structure.
type: user is the parent of type browse and purchase.
type browse will contain each browse entry.
Although deleting old data seems easier with delete by query, for the
above query, I will have to do multiple and/or has_child queries,and it
would be much less performant. In fact, initially i was using parent/child
structure, but the query time seemed really long. I thus gave it up and
tried to switch to nested object.

I am also thinking about using nested object, but break the data into
different index(like monthly index) so that I can easily expire old data.
The problem with this approach is that I have to query across those
multiple indexes, and do aggregation on that to get the distinct users,
which I assume will be much slower.(havn't tried yet). One requirement of
this project is to be able to give the count of the queries in acceptable
time frame.(like seconds) and I am afraid this approach may not be
acceptable.

The ES cluster is 7 machines, each 8 cores and 32G memory.
Any suggestions?

Thanks in advance!
Chen

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e1279e50-4ec7-4292-8ef3-49bc187498c1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.