ElasticSearch Setup to store data from multiple datacenters


(dkjhanitt) #1

Hi,
I need to stream data from different datacenters and store them centrally
in an elasticsearch cluster. I have to take replication cost and latency
into consideration. So in this case, does it makes sense to have
elasticsearch data node present in each datacenter and a client sitting
outside for query purpose ? If not, can I get suggestion on a good design.

Thanks,

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mark Walkom) #2

Cross DC elasticsearch is not a good idea due to latency.

You're better off shipping everything to a cluster in one of the DC's. We
do this and it works well.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 11 November 2013 21:26, Deepak Jha dkjhanitt@gmail.com wrote:

Hi,
I need to stream data from different datacenters and store them centrally
in an elasticsearch cluster. I have to take replication cost and latency
into consideration. So in this case, does it makes sense to have
elasticsearch data node present in each datacenter and a client sitting
outside for query purpose ? If not, can I get suggestion on a good design.

Thanks,

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(dkjhanitt) #3

Hi Mark,
Thank you for the reply.
I am using elasticsearch with Logstash for centralized logging.
Having cluster in one DC will have significant network overhead ... Did you
do any optimization in this ? If yes, can you please throw some light on it
?

On Monday, November 11, 2013 4:02:07 PM UTC+5:30, Mark Walkom wrote:

Cross DC elasticsearch is not a good idea due to latency.

You're better off shipping everything to a cluster in one of the DC's. We
do this and it works well.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 11 November 2013 21:26, Deepak Jha <dkjh...@gmail.com <javascript:>>wrote:

Hi,
I need to stream data from different datacenters and store them centrally
in an elasticsearch cluster. I have to take replication cost and latency
into consideration. So in this case, does it makes sense to have
elasticsearch data node present in each datacenter and a client sitting
outside for query purpose ? If not, can I get suggestion on a good design.

Thanks,

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mark Walkom) #4

Unfortunately, no.

It'd be nice if logstash had a compression method during transport but I'm
not aware of one.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 11 November 2013 23:54, Deepak Jha dkjhanitt@gmail.com wrote:

Hi Mark,
Thank you for the reply.
I am using elasticsearch with Logstash for centralized logging.
Having cluster in one DC will have significant network overhead ... Did
you do any optimization in this ? If yes, can you please throw some light
on it ?

On Monday, November 11, 2013 4:02:07 PM UTC+5:30, Mark Walkom wrote:

Cross DC elasticsearch is not a good idea due to latency.

You're better off shipping everything to a cluster in one of the DC's. We
do this and it works well.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 11 November 2013 21:26, Deepak Jha dkjh...@gmail.com wrote:

Hi,
I need to stream data from different datacenters and store them
centrally in an elasticsearch cluster. I have to take replication cost and
latency into consideration. So in this case, does it makes sense to have
elasticsearch data node present in each datacenter and a client sitting
outside for query purpose ? If not, can I get suggestion on a good design.

Thanks,

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(dkjhanitt) #5

Just curious to know what if I have data node in each DC and a client
sitting outside to query these.... My query will be mostly DC specific ...
Any suggestion ?

Thanks,
Deepak

On Tuesday, November 12, 2013 2:32:11 AM UTC+5:30, Mark Walkom wrote:

Unfortunately, no.

It'd be nice if logstash had a compression method during transport but I'm
not aware of one.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 11 November 2013 23:54, Deepak Jha <dkjh...@gmail.com <javascript:>>wrote:

Hi Mark,
Thank you for the reply.
I am using elasticsearch with Logstash for centralized logging.
Having cluster in one DC will have significant network overhead ... Did
you do any optimization in this ? If yes, can you please throw some light
on it ?

On Monday, November 11, 2013 4:02:07 PM UTC+5:30, Mark Walkom wrote:

Cross DC elasticsearch is not a good idea due to latency.

You're better off shipping everything to a cluster in one of the DC's.
We do this and it works well.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 11 November 2013 21:26, Deepak Jha dkjh...@gmail.com wrote:

Hi,
I need to stream data from different datacenters and store them
centrally in an elasticsearch cluster. I have to take replication cost and
latency into consideration. So in this case, does it makes sense to have
elasticsearch data node present in each datacenter and a client sitting
outside for query purpose ? If not, can I get suggestion on a good design.

Thanks,

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mark Walkom) #6

You want to avoid having a cross DC cluster, but if you mean two clusters
in each DC with a client that reads from both, then yes. That would work as
long as you coded the client to take this into account.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 12 November 2013 16:30, Deepak Jha dkjhanitt@gmail.com wrote:

Just curious to know what if I have data node in each DC and a client
sitting outside to query these.... My query will be mostly DC specific ...
Any suggestion ?

Thanks,
Deepak

On Tuesday, November 12, 2013 2:32:11 AM UTC+5:30, Mark Walkom wrote:

Unfortunately, no.

It'd be nice if logstash had a compression method during transport but
I'm not aware of one.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 11 November 2013 23:54, Deepak Jha dkjh...@gmail.com wrote:

Hi Mark,
Thank you for the reply.
I am using elasticsearch with Logstash for centralized logging.
Having cluster in one DC will have significant network overhead ... Did
you do any optimization in this ? If yes, can you please throw some light
on it ?

On Monday, November 11, 2013 4:02:07 PM UTC+5:30, Mark Walkom wrote:

Cross DC elasticsearch is not a good idea due to latency.

You're better off shipping everything to a cluster in one of the DC's.
We do this and it works well.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 11 November 2013 21:26, Deepak Jha dkjh...@gmail.com wrote:

Hi,
I need to stream data from different datacenters and store them
centrally in an elasticsearch cluster. I have to take replication cost and
latency into consideration. So in this case, does it makes sense to have
elasticsearch data node present in each datacenter and a client sitting
outside for query purpose ? If not, can I get suggestion on a good design.

Thanks,

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #7