Configuration advice

Dear elastic users,

We use elastic for a document store containing around 100M documents in
200G. We used to run a single elastic node on the same server that
functioned as the web server, but this caused a lot of performance issues
(and the need of re-indexing all data in case of failure). So, we just
acquired two new servers that will function almost exclusively as elastic
nodes, both have single SSD disks, 48G ram and a decent modern processor.
The web server has 64GB ram and raid 10 spinning disks. The web site is not
particularly high-traffic but the elastic queries can be pretty heavy with
lots of terms, wildcards, and aggregations (the site is used for scientific
text analysis - http://amcat.vu.nl)

We can imagine three possible configurations:

  1. Separate elastic cluster

Use the new "elastic" servers only for elastic with data=true,master=true,
and use the web server only for other tasks.
Advantage: simple;
disadvantage: capacity of the existing ("web") server under utilized

  1. Use all servers

Use all servers for elastic, all with data=true,master=true.
Advantage: all resources utilized.
Disadvantage: web server has no SSD and also has other tasks, so maybe
lower overall performance because it becomes the weakest link?

  1. Use web server as 'coordinator'

Use the elastic servers as data nodes (with data=true,master=false) and the
web server as master node (data=false,master=true).

advantage: this seems to use the web server capacity (esp. CPU?) while
still having all the data on the SSD's in the new elastic servers
disadvantage: more complicated to setup, only a single master node

In case (3) is the best option, two more questions:

  • How do you change a node from data=true to data=false? Just change the
    config and restart the node? Will it automatically relocate the shards?
  • What happens if the only master=true node disappears. Will the rest just
    way for it to come online again? Is the data preserved?

What would you recommend?

Thanks,

-- Wouter

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c991cff8-52af-49e7-96df-0f7dd6675a4c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I'd go with option 3, but make all 3 nodes master eligable. That way you
prevent complete loss of the cluster and protect against split brain.

To change a node role just update the config and restart the service.
Shards will be auto promoted (ie replicas to primaries), but you can
disable allocation to stop this so when your other node comes up it will
just initialise the shards it has locally, this speeds up recovery.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 25 July 2014 21:14, Wouter van Atteveldt vanatteveldt@gmail.com wrote:

Dear elastic users,

We use elastic for a document store containing around 100M documents in
200G. We used to run a single elastic node on the same server that
functioned as the web server, but this caused a lot of performance issues
(and the need of re-indexing all data in case of failure). So, we just
acquired two new servers that will function almost exclusively as elastic
nodes, both have single SSD disks, 48G ram and a decent modern processor.
The web server has 64GB ram and raid 10 spinning disks. The web site is not
particularly high-traffic but the elastic queries can be pretty heavy with
lots of terms, wildcards, and aggregations (the site is used for scientific
text analysis - http://amcat.vu.nl)

We can imagine three possible configurations:

  1. Separate elastic cluster

Use the new "elastic" servers only for elastic with data=true,master=true,
and use the web server only for other tasks.
Advantage: simple;
disadvantage: capacity of the existing ("web") server under utilized

  1. Use all servers

Use all servers for elastic, all with data=true,master=true.
Advantage: all resources utilized.
Disadvantage: web server has no SSD and also has other tasks, so maybe
lower overall performance because it becomes the weakest link?

  1. Use web server as 'coordinator'

Use the elastic servers as data nodes (with data=true,master=false) and
the web server as master node (data=false,master=true).

advantage: this seems to use the web server capacity (esp. CPU?) while
still having all the data on the SSD's in the new elastic servers
disadvantage: more complicated to setup, only a single master node

In case (3) is the best option, two more questions:

  • How do you change a node from data=true to data=false? Just change the
    config and restart the node? Will it automatically relocate the shards?
  • What happens if the only master=true node disappears. Will the rest just
    way for it to come online again? Is the data preserved?

What would you recommend?

Thanks,

-- Wouter

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c991cff8-52af-49e7-96df-0f7dd6675a4c%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c991cff8-52af-49e7-96df-0f7dd6675a4c%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YR_L2h4TUTJX20XZqGZzw5ye1B1n0%2BwYZwRFXutPgfcg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Dear Mark, others,

Thanks for the advice! I do have some more questions I'm afraid...

On Friday, July 25, 2014 1:30:07 PM UTC+2, Mark Walkom wrote:

I'd go with option 3, but make all 3 nodes master eligable. That way you
prevent complete loss of the cluster and protect against split brain.

Right. Is there any advantage to having the 'coordinator' be the master of
the cluster? I.e., suppose I restart the coordinator and one of the data
nodes becomes master, should I then somehow make the coordinator the master
again?

To change a node role just update the config and restart the service.
Shards will be auto promoted (ie replicas to primaries), but you can
disable allocation to stop this so when your other node comes up it will
just initialise the shards it has locally, this speeds up recovery.

Well, the current situation is that the shards are divided over the three
nodes, each with 1 replica. So, if I change the coordinator into
data=false, it will need to distribute the shards it had over the other two
nodes so each node has a copy of each shard, so I need to allow allocation,
right? Or should I first set # replicates to 2, so the data is copied to
all nodes, before setting the coordinator to data=false?

Sorry for all the questions, but this part of elastic is new to me and I'm
afraid to mess up the index...

Thanks,

Wouter

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/899f5ae8-202b-4f59-9e20-5ef05148aec7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.