Sharding question

samdash · May 30, 2017, 6:47pm

I have one index which has around 10K documents which may grow
max 20K documents in few years,And i have another index which has more than 20Million documents and likely to grow more than double or triple in a year.

how to pick my shards per index , this is for 3 node cluster all being master eligible [ node.master: true and node.data:true]

for 10K document shall i have 3 shards and 1 replica and for 20M documents index should i have 5 shards and 2 replica's ? If so , when i wanted to scale horizontally and add more nodes what will be the effect? and what should be the configuration if i wanted to add let's say 3 or 4 more nodes?

Sam.

warkolm · May 31, 2017, 12:49am

What sort of data is it?

samdash · May 31, 2017, 1:52am

20 Million Documents data is mostly user views/ratings and 10K document is just the tags for documents.

warkolm · May 31, 2017, 2:46am

You should use time based indices for reviews, given they happen at a specific time. That means you can start with a daily/weekly/monthly index with a small shard count, and easily scale.

samdash · May 31, 2017, 3:43am

If using Time based index , i end up having 365 or 52 or 12 indexes , having 365 indexes is it a good thing?
and if i have to search/aggregate using Transport Client using java for last 30 or 90 day's do i have to specific all the indexes while building the query? little bit newbie here , and how to create time based indexes?

warkolm · May 31, 2017, 5:24am

Depends on your data volumes. I'd imagine for that 50 million size you can go with monthly with 1-3 primary shards and one replica.

I'd use the java rest client if you are just starting. The transport one will be deprecated eventually.

samdash · May 31, 2017, 4:02pm

Does Rest Client has Aggregate Builders and all other features like pagination, Bulk Operations...etc that Transport client provides?

dadoonet · May 31, 2017, 4:23pm

Not yet. But this is coming...

samdash · May 31, 2017, 4:54pm

Thanks , we are going production soon and will be using Transport Client, In Which version are you going to deprecate Transport client?

dadoonet · May 31, 2017, 7:33pm

I don't know. @javanna probably knows.

javanna · May 31, 2017, 8:23pm

Hi,
we will most likely deprecate Transport Client in 6.0 , and remove it in 7.0 .

Miguel1 · June 5, 2017, 6:58am

Shards shouldn't exceed 50 GB (undersharding) and shouldn't be less than 5 GB (oversharding). In your case (considering that my average size for that amount of logs never exceeds 50 GB) I would go for 1 shards and 2 more replicas for taking advantage of the other two nodes.
Remember that replicas are not backups, but they can provide you HA.

system · July 3, 2017, 6:58am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.