Advice about project architecture

Hello guys, just wondering some best practices.

I have a news web site, on 6 languages, 7 daily indices for each, 10/20Go per index, 1 million search for all per day.

I would like to separate each language, is it good to:

  1. Create 6 clusters with 3 master node, 3 data nodes, and have X tribe nodes as client node to query all.
  2. 1 big cluster, 3 master nodes, 18 data nodes with shard allocation per node per language and X client nodes

?

I am a bit confuse of the tribe node usage.

Thanks you,

7 daily indices isn't too many. I'd go with option 2. Are you indexing the news or logs or something? If logs then I don't think you need an index per language. You might do better putting them all in one. If you are indexing the news, well, then you might want an index per language, that is up to you, but it is more likely than if it is logs.

You should also have a look at the rollover and shrink APIs, they might make more sense then using daily indices, especially if you want an index per language and the languages very in ingest rate.

Thanks you nik.

Any experience with tribe nodes? I don't know what they are good for or not.

BTW, very interesting post about rollover (https://www.elastic.co/blog/managing-time-based-indices-efficiently) thanks you to introduce me this new ES feature nik!

So it seems a good practice to have a single big cluster, with differente "node groups" (via routing stuff) ?

Personally I don't have any but I know they need to stay on the same version as all nodes which I'm not a fan of.

Up to a point, yes. Once you start getting to many dozens of nodes you start to wish you had multiple clusters. It is a thing we've been thinking about lately. Ways to make that nicer.

But if you are running less than 50 nodes you aren't likely to notice anything. Eventually (how many nodes, I don't know, it depends on lots of stuff, I guess) you'll start to see things like adding fields and moving shards from node to node take longer than it should.

Thanks you again, I prefer to go for a single cluster with routing, but my colleagues prefer a cluster per langue to avoid "blast radius" of a crash (I found that stupid but ...).

So I am looking for good reason to not use a such architecture.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.