Wanna seek your suggestions on the index design for web activities.
Lets say I have browse data, online purchase data, and store purchase
data, and I will need to save a year of them.
For browse data, a year of data is around 80G , online purchase data is
around 50G, and offline data is around 1T.
I have to do query like, e.g, find all the customers who browsed item A in
the past X months, and also online purchased B in the past Y month.
Originally I am using complicated parent/child structure, and that
sometimes results in very bad performance. and I store all browse
data/online purchase/store purchase in one index distributed to 7 shards.
I have 7 machines with 128G each, and 1T hard disk.
Now, I am trying to save each of those type of data into its own index, say
browse_v1, onlinepurchase_v1, storepurchase_v1. Since its time based data,
how should I decide to break them into monthly , or simply yearly? for
browse(70G)/online purchase(50G), i think i can just use one index and one
shard for them,. or should I break them into monthly data instead? breaking
into monthly indexes gives me the flexibility of adding/removing data, but
it also will decrease the query performance, right? (search against 1 index
now becomes search against 12 indexes).
For store data(1T) apparently I have to break them into at least monthly
index, but each monthly index still contains around 100G data. With my
current cluster, how many shards should I allocate to each monthly index? I
am also concerned about the query performance.
Then since I am now storing them into separate indexes, to achieve the
query I want, I will need to do application level join. Is this the common
way to handle such user case?
I know I should perform some testing first, but hope someone may have
similar experience in handling this and could provide some guidance.
thanks in advance,
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2cba8839-2577-4fd7-b1e9-550ae579bb1a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.