ElasticSearch setup


(Luiz Guilherme Santos) #1

Hi guys,

We are working in a project to analyze a possible change of a search
solution based in Fast to one based in ElasticSearch. We
are analyzing several possible configurations and we have some doubts on
how to proceed with our setup.

The first one is which is the method do calculate the number of shards and
replicas. Is there any rule to follow considering the size of the index and
memory and hard disk availability?

Another point we are studying is if there is any way to separe the
indexing servers from the search servers. It seams that we could active
this by putting the primary shards in one server group and the search
server would have only the replicas. Does it make sense? If so how can we
make this configuration?

We will have a set of indexes separated by several products. Do we have
to guarantee the allocation of data in the same shard using cluster routing
allocation or is there a better way to do that?

Our cluster will consist in 6 server with Intel L5640 of 6 cores, 64 Gb RAM
and 6 discs SAS-2 of 300 GB RAID5. Can we active a good performance results
with this configuration? Which is the best way to use this server with
Elasticsearch?

--
Luiz Guilherme P. Santos


(Karussell) #2

The first one is which is the method do calculate the number of shards
and replicas. Is there any rule to follow considering the size of the index
and memory and hard disk availability?

how many machines/indices are you using for FAST? what is you index size,
query requirements etc

Another point we are studying is if there is any way to separe the
indexing servers from the search servers.

Why do you want to do this?

Peter.

On Tuesday, March 6, 2012 3:27:43 AM UTC+1, Luiz Guilherme wrote:

Hi guys,

We are working in a project to analyze a possible change of a search
solution based in Fast to one based in ElasticSearch. We
are analyzing several possible configurations and we have some doubts on
how to proceed with our setup.

The first one is which is the method do calculate the number of shards and
replicas. Is there any rule to follow considering the size of the index and
memory and hard disk availability?

Another point we are studying is if there is any way to separe the
indexing servers from the search servers. It seams that we could active
this by putting the primary shards in one server group and the search
server would have only the replicas. Does it make sense? If so how can we
make this configuration?

We will have a set of indexes separated by several products. Do we have
to guarantee the allocation of data in the same shard using cluster routing
allocation or is there a better way to do that?

Our cluster will consist in 6 server with Intel L5640 of 6 cores, 64 Gb
RAM and 6 discs SAS-2 of 300 GB RAID5. Can we active a good performance
results with this configuration? Which is the best way to use this server
with Elasticsearch?

--
Luiz Guilherme P. Santos


(haarts) #3

(Disclaimer I consider myself slightly above ES n00b)
What is the performance you're looking for? To me the servers you have
should handle almost anything you throw at it.

There is no way to calculate what amount of documents a shard can handle.
The general advise is to load a bunch of data in a one shard setup and
monitor disk and memory (BigDesk https://github.com/lukas-vlcek/bigdesk is
an easy way of monitoring, also Elasticsearch-head
http://mobz.github.com/elasticsearch-head/).

I don't see a point in splitting the search and index servers, unless you
really like doing ops.

I'm not sure about the multi index data allocation question.

Hope this helps.

On Tuesday, 6 March 2012 03:27:43 UTC+1, Luiz Guilherme wrote:

Hi guys,

We are working in a project to analyze a possible change of a search
solution based in Fast to one based in ElasticSearch. We
are analyzing several possible configurations and we have some doubts on
how to proceed with our setup.

The first one is which is the method do calculate the number of shards and
replicas. Is there any rule to follow considering the size of the index and
memory and hard disk availability?

Another point we are studying is if there is any way to separe the
indexing servers from the search servers. It seams that we could active
this by putting the primary shards in one server group and the search
server would have only the replicas. Does it make sense? If so how can we
make this configuration?

We will have a set of indexes separated by several products. Do we have
to guarantee the allocation of data in the same shard using cluster routing
allocation or is there a better way to do that?

Our cluster will consist in 6 server with Intel L5640 of 6 cores, 64 Gb
RAM and 6 discs SAS-2 of 300 GB RAID5. Can we active a good performance
results with this configuration? Which is the best way to use this server
with Elasticsearch?

--
Luiz Guilherme P. Santos


(Shay Banon) #4

You can't really split search and index servers in elasticsearch. Since you want to have (near) real time search, indexing happens also on the replica shards as well as the primary shards (and those can change as machines come and go). See more info here: http://www.elasticsearch.org/videos/2011/08/09/road-to-a-distributed-searchengine-berlinbuzzwords.html.

Regarding the number of shards, that really depends on how your data flows, and how much data you index, and because of the very broad aspect of document types, indexing mappings, and the like, its kindda hard to answer. The suggested method of using a single shard and hammering it is a good one.

Your setup sounds good, but without knowing a bit more on the data and type of searches executed, its hard to give recommendations.

On Tuesday, March 6, 2012 at 12:25 PM, haarts wrote:

(Disclaimer I consider myself slightly above ES n00b)
What is the performance you're looking for? To me the servers you have should handle almost anything you throw at it.

There is no way to calculate what amount of documents a shard can handle. The general advise is to load a bunch of data in a one shard setup and monitor disk and memory (BigDesk https://github.com/lukas-vlcek/bigdesk is an easy way of monitoring, also Elasticsearch-head http://mobz.github.com/elasticsearch-head/).

I don't see a point in splitting the search and index servers, unless you really like doing ops.

I'm not sure about the multi index data allocation question.

Hope this helps.

On Tuesday, 6 March 2012 03:27:43 UTC+1, Luiz Guilherme wrote:

Hi guys,

We are working in a project to analyze a possible change of a search solution based in Fast to one based in ElasticSearch. We are analyzing several possible configurations and we have some doubts on how to proceed with our setup.

The first one is which is the method do calculate the number of shards and replicas. Is there any rule to follow considering the size of the index and memory and hard disk availability?

Another point we are studying is if there is any way to separe the indexing servers from the search servers. It seams that we could active this by putting the primary shards in one server group and the search server would have only the replicas. Does it make sense? If so how can we make this configuration?

We will have a set of indexes separated by several products. Do we have to guarantee the allocation of data in the same shard using cluster routing allocation or is there a better way to do that?

Our cluster will consist in 6 server with Intel L5640 of 6 cores, 64 Gb RAM and 6 discs SAS-2 of 300 GB RAID5. Can we active a good performance results with this configuration? Which is the best way to use this server with Elasticsearch?

--
Luiz Guilherme P. Santos


(system) #5