Hey Guys,
Wanna seek your suggestions on the index design for web activities.
Lets say I have browse data, online purchase data, and store purchase
data, and I will need to save a year of them.
For browse data, a year of data is around 80G , online purchase data is
around 50G, and offline data is around 1T.
I have to do query like, e.g, find all the customers who browsed item A in
the past X months, and also online purchased B in the past Y month.
Originally I am using complicated parent/child structure, and that
sometimes results in very bad performance. and I store all browse
data/online purchase/store purchase in one index distributed to 7 shards.
I have 7 machines with 128G each, and 1T hard disk.
Now, I am trying to save each of those type of data into its own index, say
browse_v1, onlinepurchase_v1, storepurchase_v1. Since its time based data,
how should I decide to break them into monthly , or simply yearly? for
browse(70G)/online purchase(50G), i think i can just use one index and one
shard for them,. or should I break them into monthly data instead? breaking
into monthly indexes gives me the flexibility of adding/removing data, but
it also will decrease the query performance, right? (search against 1 index
now becomes search against 12 indexes).
For store data(1T) apparently I have to break them into at least monthly
index, but each monthly index still contains around 100G data. With my
current cluster, how many shards should I allocate to each monthly index? I
am also concerned about the query performance.
Then since I am now storing them into separate indexes, to achieve the
query I want, I will need to do application level join. Is this the common
way to handle such user case?
I know I should perform some testing first, but hope someone may have
similar experience in handling this and could provide some guidance.
thanks in advance,
Chen
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2cba8839-2577-4fd7-b1e9-550ae579bb1a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.