Elasticsearch cluster setup

Hello

I'm newbie to search and elasticsearch. I have gone some online docs and developed some app using elasticsearch setup in our test environment. So far, its smooth in developing and testing, Now do create in production and setup the cluster, i need some expert advise on,

  1. Number of shards
  2. Number of replicas
  3. Should i need to separate out master and data nodes
  4. can all the nodes be data node
  5. i dont have any advanced search use case, but atleast need plural match (phone) should match all docs with phones and vice versa, any special stemming need in this case ?

My usecase and traffic patterns are,

  1. Upto 100M read per day
  2. Upto 1M write/update per day
  3. Initial data size 10GB, grow rate 1 GB every 6 months

Cluster info

  1. Initial cluster size 14 machines, 28 GB RAM / 120 GB spin hard disk / 12 cores
  2. load balancer with dns, would distribute the traffic to any 14 machines.

I have used unicast and i have bootstrap.mlockall: true and index.routing.allocation.disable_allocation: false

Please advise.

  1. Depends
  2. Depends
  3. Depends
  4. Depends
  5. I'll leave this for someone else

So that aside, it depends! What are you expecting your load to be like,
your retention, your node count, your node specs, your queries? You know
some of this based on your dev setup, and from there extrapolate out.

To be a bit more specific around 1 you ideally want 1 shard per node and to
get that you can over allocate - eg create an index with 10 shards if you
expect to grow to 10 nodes.
For 2 that's up to you, having at least one replica means you have a single
level of redundancy and gives you better search performance, the more
shards you add the more those factors go up, however you do use more and
more resources.

Your nodes seem to be pretty reasonable, was your dev cluster running on
the same specs?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 28 September 2014 10:14, shakul shakul.hameed+es@gmail.com wrote:

Hello

I'm newbie to search and elasticsearch. I have gone some online docs and
developed some app using elasticsearch setup in our test environment. So
far, its smooth in developing and testing, Now do create in production and
setup the cluster, i need some expert advise on,

  1. Number of shards
  2. Number of replicas
  3. Should i need to separate out master and data nodes
  4. can all the nodes be data node
  5. i dont have any advanced search use case, but atleast need plural match
    (phone) should match all docs with phones and vice versa, any special
    stemming need in this case ?

My usecase and traffic patterns are,

  1. Upto 100M read per day
  2. Upto 1M write/update per day
  3. Initial data size 10GB, grow rate 1 GB every 6 months

Cluster info

  1. Initial cluster size 14 machines, 28 GB RAM / 120 GB spin hard disk / 12
    cores
  2. load balancer with dns, would distribute the traffic to any 14 machines.

I have used unicast and i have bootstrap.mlockall: true and
index.routing.allocation.disable_allocation: false

Please advise.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Elasticsearch-cluster-setup-tp4064134.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1411863291151-4064134.post%40n3.nabble.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624aKLWS2Sj920GeYGEvpd8bGkD0HCxrsnRHxWyHbpOPwwg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Mark,

Thanks for the response.

I'm expecting a load of 100 M read(search 50% query by search keyword , 50% query by id).

Retention policy - docs exists as long as user manually deletes, currently index is size small around 10 GB, in future it really grows very big, i'm planning to purge least used docs in 3 or 6 months.

I have 10 nodes in the cluster currently, i can add more if needed. Node spec are 8 cores / 24 GB RAM / 120 GB hard spin disk

My dev cluster was also 10 machines, but less spec. I had the default settings of 5 shards and 1 replica.

I don't want to trial and see in production, thus requiring expert advise, on

  1. Number of shards
  2. Number of replicas
  3. Should i need to separate out master and data nodes
  4. can all the nodes be data node
  5. i dont have any advanced search use case, but atleast need plural match
    (phone) should match all docs with phones and vice versa, any special
    stemming need in this case ?

If you need any more info, im glad to provide, please advise.

thanks.

If you had a dev cluster then you should have a decent idea of what those
nodes could handle based on your use case. So it should be easy to
extrapolate the load and capacity for your prod nodes.
Other than that, my original answers are still relevant.

If you want definitive, expert answers, then you may have to pay someone :wink:

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 30 September 2014 05:00, shakul shakul.hameed+es@gmail.com wrote:

Hi Mark,

Thanks for the response.

I'm expecting a load of 100 M read(search 50% query by search keyword , 50%
query by id).

Retention policy - docs exists as long as user manually deletes, currently
index is size small around 10 GB, in future it really grows very big, i'm
planning to purge least used docs in 3 or 6 months.

I have 10 nodes in the cluster currently, i can add more if needed. Node
spec are 8 cores / 24 GB RAM / 120 GB hard spin disk

My dev cluster was also 10 machines, but less spec. I had the default
settings of 5 shards and 1 replica.

I don't want to trial and see in production, thus requiring expert advise,
on

  1. Number of shards
  2. Number of replicas
  3. Should i need to separate out master and data nodes
  4. can all the nodes be data node
  5. i dont have any advanced search use case, but atleast need plural match
    (phone) should match all docs with phones and vice versa, any special
    stemming need in this case ?

If you need any more info, im glad to provide, please advise.

thanks.

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Elasticsearch-cluster-setup-tp4064134p4064195.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1412017249493-4064195.post%40n3.nabble.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZSsFj%2BKXQOEVBMybxTDsKqB6emP%3DX%2BxXqC_VP0cpagqg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.