Concurrent update issue


(vreal) #1

Hi,

Currently I've running 2 elasticsearch servers in one cluster and started another web server which has an client node in the cluster as well.
Then I started a map reduce operation which insert couple of thousands recodes in to the cluster and I found the process often got the connect time out error
and the process failed. To solve the problem we need to restart elasticsearch servers.

I wander if someone can share some experience of elasticsearch in batch update process. Is there any limits? How can we avoid those problem?

By the way our process just grab a Node Client when it start and close it at the end of the process. And our web server will always keep the client. Is that the correct usage?

Regards,
Bruce Zhou


(David Pilato) #2

Is your batch node a "client only" node ? It must not handle data.

David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 18 juin 2012 à 06:00, Ye Zhou zhouy.vreal@gmail.com a écrit :

Hi,

Currently I've running 2 elasticsearch servers in one cluster and started another web server which has an client node in the cluster as well.
Then I started a map reduce operation which insert couple of thousands recodes in to the cluster and I found the process often got the connect time out error
and the process failed. To solve the problem we need to restart elasticsearch servers.

I wander if someone can share some experience of elasticsearch in batch update process. Is there any limits? How can we avoid those problem?

By the way our process just grab a Node Client when it start and close it at the end of the process. And our web server will always keep the client. Is that the correct usage?

Regards,
Bruce Zhou


(vreal) #3

Yes, I use the client only nodes but in mapreduce process that will be many clients been created. The process is success after I restarted each severs elasticsearch cluster and reran it. But it seems some reliability problem in the elasticsearch cluster if I want run the process everyday. It seems after couple of times process it will have problem to let the client node to join the cluster because they can't connect to the servers.

I'm running two small instances in the amazon EC2 as cluster and there are about 10 indices, about 15000 records in 55 shards. Is there any config I need to set?

Regards,
Bruce Zhou

On 18/06/2012, at 12:56 PM, David Pilato wrote:

Is your batch node a "client only" node ? It must not handle data.

David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 18 juin 2012 à 06:00, Ye Zhou zhouy.vreal@gmail.com a écrit :

Hi,

Currently I've running 2 elasticsearch servers in one cluster and started another web server which has an client node in the cluster as well.
Then I started a map reduce operation which insert couple of thousands recodes in to the cluster and I found the process often got the connect time out error
and the process failed. To solve the problem we need to restart elasticsearch servers.

I wander if someone can share some experience of elasticsearch in batch update process. Is there any limits? How can we avoid those problem?

By the way our process just grab a Node Client when it start and close it at the end of the process. And our web server will always keep the client. Is that the correct usage?

Regards,
Bruce Zhou


(David Pilato) #4

Something I don't understand. How many nodes, how many clients do you start ?
You have 2 nodes and 1 batch ?

In your batch, you should have only one node (only one client).

Is it your setup ?

David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 18 juin 2012 à 07:24, Ye Zhou zhouy.vreal@gmail.com a écrit :

Yes, I use the client only nodes but in mapreduce process that will be many clients been created. The process is success after I restarted each severs elasticsearch cluster and reran it. But it seems some reliability problem in the elasticsearch cluster if I want run the process everyday. It seems after couple of times process it will have problem to let the client node to join the cluster because they can't connect to the servers.

I'm running two small instances in the amazon EC2 as cluster and there are about 10 indices, about 15000 records in 55 shards. Is there any config I need to set?

Regards,
Bruce Zhou

On 18/06/2012, at 12:56 PM, David Pilato wrote:

Is your batch node a "client only" node ? It must not handle data.

David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 18 juin 2012 à 06:00, Ye Zhou zhouy.vreal@gmail.com a écrit :

Hi,

Currently I've running 2 elasticsearch servers in one cluster and started another web server which has an client node in the cluster as well.
Then I started a map reduce operation which insert couple of thousands recodes in to the cluster and I found the process often got the connect time out error
and the process failed. To solve the problem we need to restart elasticsearch servers.

I wander if someone can share some experience of elasticsearch in batch update process. Is there any limits? How can we avoid those problem?

By the way our process just grab a Node Client when it start and close it at the end of the process. And our web server will always keep the client. Is that the correct usage?

Regards,
Bruce Zhou


(vreal) #5

I have 2 server and 1 batch, But the batch is a hadoop mapreduce process which started many clients. We have 6 map on 2 instances so it started 6 clients.

On 18/06/2012, at 1:57 PM, David Pilato wrote:

Something I don't understand. How many nodes, how many clients do you start ?
You have 2 nodes and 1 batch ?

In your batch, you should have only one node (only one client).

Is it your setup ?

David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 18 juin 2012 à 07:24, Ye Zhou zhouy.vreal@gmail.com a écrit :

Yes, I use the client only nodes but in mapreduce process that will be many clients been created. The process is success after I restarted each severs elasticsearch cluster and reran it. But it seems some reliability problem in the elasticsearch cluster if I want run the process everyday. It seems after couple of times process it will have problem to let the client node to join the cluster because they can't connect to the servers.

I'm running two small instances in the amazon EC2 as cluster and there are about 10 indices, about 15000 records in 55 shards. Is there any config I need to set?

Regards,
Bruce Zhou

On 18/06/2012, at 12:56 PM, David Pilato wrote:

Is your batch node a "client only" node ? It must not handle data.

David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 18 juin 2012 à 06:00, Ye Zhou zhouy.vreal@gmail.com a écrit :

Hi,

Currently I've running 2 elasticsearch servers in one cluster and started another web server which has an client node in the cluster as well.
Then I started a map reduce operation which insert couple of thousands recodes in to the cluster and I found the process often got the connect time out error
and the process failed. To solve the problem we need to restart elasticsearch servers.

I wander if someone can share some experience of elasticsearch in batch update process. Is there any limits? How can we avoid those problem?

By the way our process just grab a Node Client when it start and close it at the end of the process. And our web server will always keep the client. Is that the correct usage?

Regards,
Bruce Zhou


(andym) #6

Ye,
You can overwhelm your servers if you open to many connections / start
inserting documents with large insert rate.

You might monitor the ES servers with something like bigdesk, find out
the optimal insertion rate and then have your clients not to exceed
that rate.

Additionally you might want to throttle document-insertion-code on
your client and back off (wait and retry) if ES starts returning
errors.

Also, for 3 small EC2 instances 55 shards seems like too many shards.

On Jun 18, 12:18 am, Ye Zhou zhouy.vr...@gmail.com wrote:

I have 2 server and 1 batch, But the batch is a hadoop mapreduce process which started many clients. We have 6 map on 2 instances so it started 6 clients.

On 18/06/2012, at 1:57 PM, David Pilato wrote:

Something I don't understand. How many nodes, how many clients do you start ?
You have 2 nodes and 1 batch ?

In your batch, you should have only one node (only one client).

Is it your setup ?

David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 18 juin 2012 à 07:24, Ye Zhou zhouy.vr...@gmail.com a écrit :

Yes, I use the client only nodes but in mapreduce process that will be many clients been created. The process is success after I restarted each severs elasticsearch cluster and reran it. But it seems some reliability problem in the elasticsearch cluster if I want run the process everyday. It seems after couple of times process it will have problem to let the client node to join the cluster because they can't connect to the servers.

I'm running two small instances in the amazon EC2 as cluster and there are about 10 indices, about 15000 records in 55 shards. Is there any config I need to set?

Regards,
Bruce Zhou

On 18/06/2012, at 12:56 PM, David Pilato wrote:

Is your batch node a "client only" node ? It must not handle data.

David :wink:
Twitter : @dadoonet / @elasticsearchfr

Le 18 juin 2012 à 06:00, Ye Zhou zhouy.vr...@gmail.com a écrit :

Hi,

Currently I've running 2 elasticsearch servers in one cluster and started another web server which has an client node in the cluster as well.
Then I started a map reduce operation which insert couple of thousands recodes in to the cluster and I found the process often got the connect time out error
and the process failed. To solve the problem we need to restart elasticsearch servers.

I wander if someone can share some experience of elasticsearch in batch update process. Is there any limits? How can we avoid those problem?

By the way our process just grab a Node Client when it start and close it at the end of the process. And our web server will always keep the client. Is that the correct usage?

Regards,
Bruce Zhou


(system) #7