I'm newbie to search and elasticsearch. I have gone some online docs and developed some app using elasticsearch setup in our test environment. So far, its smooth in developing and testing, Now do create in production and setup the cluster, i need some expert advise on,
Number of shards
Number of replicas
Should i need to separate out master and data nodes
can all the nodes be data node
i dont have any advanced search use case, but atleast need plural match (phone) should match all docs with phones and vice versa, any special stemming need in this case ?
My usecase and traffic patterns are,
Upto 100M read per day
Upto 1M write/update per day
Initial data size 10GB, grow rate 1 GB every 6 months
Cluster info
Initial cluster size 14 machines, 28 GB RAM / 120 GB spin hard disk / 12 cores
load balancer with dns, would distribute the traffic to any 14 machines.
I have used unicast and i have bootstrap.mlockall: true and index.routing.allocation.disable_allocation: false
So that aside, it depends! What are you expecting your load to be like,
your retention, your node count, your node specs, your queries? You know
some of this based on your dev setup, and from there extrapolate out.
To be a bit more specific around 1 you ideally want 1 shard per node and to
get that you can over allocate - eg create an index with 10 shards if you
expect to grow to 10 nodes.
For 2 that's up to you, having at least one replica means you have a single
level of redundancy and gives you better search performance, the more
shards you add the more those factors go up, however you do use more and
more resources.
Your nodes seem to be pretty reasonable, was your dev cluster running on
the same specs?
I'm newbie to search and elasticsearch. I have gone some online docs and
developed some app using elasticsearch setup in our test environment. So
far, its smooth in developing and testing, Now do create in production and
setup the cluster, i need some expert advise on,
Number of shards
Number of replicas
Should i need to separate out master and data nodes
can all the nodes be data node
i dont have any advanced search use case, but atleast need plural match
(phone) should match all docs with phones and vice versa, any special
stemming need in this case ?
My usecase and traffic patterns are,
Upto 100M read per day
Upto 1M write/update per day
Initial data size 10GB, grow rate 1 GB every 6 months
Cluster info
Initial cluster size 14 machines, 28 GB RAM / 120 GB spin hard disk / 12
cores
load balancer with dns, would distribute the traffic to any 14 machines.
I have used unicast and i have bootstrap.mlockall: true and
index.routing.allocation.disable_allocation: false
I'm expecting a load of 100 M read(search 50% query by search keyword , 50% query by id).
Retention policy - docs exists as long as user manually deletes, currently index is size small around 10 GB, in future it really grows very big, i'm planning to purge least used docs in 3 or 6 months.
I have 10 nodes in the cluster currently, i can add more if needed. Node spec are 8 cores / 24 GB RAM / 120 GB hard spin disk
My dev cluster was also 10 machines, but less spec. I had the default settings of 5 shards and 1 replica.
I don't want to trial and see in production, thus requiring expert advise, on
Number of shards
Number of replicas
Should i need to separate out master and data nodes
can all the nodes be data node
i dont have any advanced search use case, but atleast need plural match
(phone) should match all docs with phones and vice versa, any special
stemming need in this case ?
If you need any more info, im glad to provide, please advise.
If you had a dev cluster then you should have a decent idea of what those
nodes could handle based on your use case. So it should be easy to
extrapolate the load and capacity for your prod nodes.
Other than that, my original answers are still relevant.
If you want definitive, expert answers, then you may have to pay someone
I'm expecting a load of 100 M read(search 50% query by search keyword , 50%
query by id).
Retention policy - docs exists as long as user manually deletes, currently
index is size small around 10 GB, in future it really grows very big, i'm
planning to purge least used docs in 3 or 6 months.
I have 10 nodes in the cluster currently, i can add more if needed. Node
spec are 8 cores / 24 GB RAM / 120 GB hard spin disk
My dev cluster was also 10 machines, but less spec. I had the default
settings of 5 shards and 1 replica.
I don't want to trial and see in production, thus requiring expert advise,
on
Number of shards
Number of replicas
Should i need to separate out master and data nodes
can all the nodes be data node
i dont have any advanced search use case, but atleast need plural match
(phone) should match all docs with phones and vice versa, any special
stemming need in this case ?
If you need any more info, im glad to provide, please advise.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.