Index data in Mapreduce

vreal · March 27, 2012, 2:50am

Hi,

I'm currently writing a batch process which using amazon mapreduce and the results will be stored in database and indexed into elasticsearch cluster. The total amound of data will be less than 20k records. I've started 2 servers in the cluster with 2 cores and 3.75G memory (Amazon m1.medium instance) and I've also increate the max open file to 32000.

But during the process I've always get the "too many open files" error and the elastic search will stop serving for quite a while until I login to the server and restart the elasticsearch daemon.

Does elasticsearch support massive indexing operation? And if not what can I do to deal with the requirement?

Can anyone help me on this issue? Thanks.

Regards,
Ye Zhou

vineeth_mohan · March 27, 2012, 3:09am

Increasing the number of open files at OS level is highly recommended for
elasticSearch.
So please go ahead and do that and then start elasticSearch.

Thanks
Vineeth

On Tue, Mar 27, 2012 at 8:20 AM, Ye Zhou zhouy.vreal@gmail.com wrote:

Hi,

I'm currently writing a batch process which using amazon mapreduce and the
results will be stored in database and indexed into elasticsearch cluster.
The total amound of data will be less than 20k records. I've started 2
servers in the cluster with 2 cores and 3.75G memory (Amazon m1.medium
instance) and I've also increate the max open file to 32000.

But during the process I've always get the "too many open files" error and
the Elasticsearch will stop serving for quite a while until I login to the
server and restart the elasticsearch daemon.

Does elasticsearch support massive indexing operation? And if not what can
I do to deal with the requirement?

Can anyone help me on this issue? Thanks.

Regards,
Ye Zhou

vreal · March 27, 2012, 3:31am

Hi,

The elasticsearch is running as root user.
And my setting in /etc/security/limits.conf is:
root soft nofile 102400
root hard nofile 102400

Then I just reboot the instance.

Is that setting all right?

Regards,
Ye Zhou

On 2012/03/27, at 11:9 , Vineeth Mohan wrote:

Increasing the number of open files at OS level is highly recommended for elasticSearch.
So please go ahead and do that and then start elasticSearch.

Thanks
Vineeth

On Tue, Mar 27, 2012 at 8:20 AM, Ye Zhou zhouy.vreal@gmail.com wrote:
Hi,

I'm currently writing a batch process which using amazon mapreduce and the results will be stored in database and indexed into elasticsearch cluster. The total amound of data will be less than 20k records. I've started 2 servers in the cluster with 2 cores and 3.75G memory (Amazon m1.medium instance) and I've also increate the max open file to 32000.

But during the process I've always get the "too many open files" error and the Elasticsearch will stop serving for quite a while until I login to the server and restart the elasticsearch daemon.

Does elasticsearch support massive indexing operation? And if not what can I do to deal with the requirement?

Can anyone help me on this issue? Thanks.

Regards,
Ye Zhou

otisg · March 27, 2012, 6:47pm

Hi,

It's hard to tell if those particular numbers are good for you or not. Use
`lsof' command as root (or with sudo) to check how many files ES is using.
You may also want to run ES as a non-root user.
ulimit -a will show you if your limits.conf numbers are really in effect.

Otis

Hiring Elasticsearch Consultants --

On Tuesday, March 27, 2012 11:31:02 AM UTC+8, Ye Zhou wrote:

Hi,

The elasticsearch is running as root user.
And my setting in /etc/security/limits.conf is:
root soft nofile 102400
root hard nofile 102400

Then I just reboot the instance.

Is that setting all right?

Regards,
Ye Zhou

On 2012/03/27, at 11:9 , Vineeth Mohan wrote:

Increasing the number of open files at OS level is highly recommended for
elasticSearch.
So please go ahead and do that and then start elasticSearch.

Thanks
Vineeth

On Tue, Mar 27, 2012 at 8:20 AM, Ye Zhou zhouy.vreal@gmail.com wrote:

Hi,

I'm currently writing a batch process which using amazon mapreduce and
the results will be stored in database and indexed into elasticsearch
cluster. The total amound of data will be less than 20k records. I've
started 2 servers in the cluster with 2 cores and 3.75G memory (Amazon
m1.medium instance) and I've also increate the max open file to 32000.

But during the process I've always get the "too many open files" error
and the Elasticsearch will stop serving for quite a while until I login to
the server and restart the elasticsearch daemon.

Does elasticsearch support massive indexing operation? And if not what
can I do to deal with the requirement?

Can anyone help me on this issue? Thanks.

Regards,
Ye Zhou

On Tuesday, March 27, 2012 11:31:02 AM UTC+8, Ye Zhou wrote:

Hi,

The elasticsearch is running as root user.
And my setting in /etc/security/limits.conf is:
root soft nofile 102400
root hard nofile 102400

Then I just reboot the instance.

Is that setting all right?

Regards,
Ye Zhou

On 2012/03/27, at 11:9 , Vineeth Mohan wrote:

Increasing the number of open files at OS level is highly recommended for
elasticSearch.
So please go ahead and do that and then start elasticSearch.

Thanks
Vineeth

On Tue, Mar 27, 2012 at 8:20 AM, Ye Zhou zhouy.vreal@gmail.com wrote:

Hi,

I'm currently writing a batch process which using amazon mapreduce and
the results will be stored in database and indexed into elasticsearch
cluster. The total amound of data will be less than 20k records. I've
started 2 servers in the cluster with 2 cores and 3.75G memory (Amazon
m1.medium instance) and I've also increate the max open file to 32000.

But during the process I've always get the "too many open files" error
and the Elasticsearch will stop serving for quite a while until I login to
the server and restart the elasticsearch daemon.

Does elasticsearch support massive indexing operation? And if not what
can I do to deal with the requirement?

Can anyone help me on this issue? Thanks.

Regards,
Ye Zhou

kimchy · March 28, 2012, 10:30am

Can you make sure that the file limit configuration is actually applied?
You can use teh nodes info API to see the max open file limit that the
process actually runs with.

On Tue, Mar 27, 2012 at 5:31 AM, Ye Zhou zhouy.vreal@gmail.com wrote:

Hi,

The elasticsearch is running as root user.
And my setting in /etc/security/limits.conf is:
root soft nofile 102400
root hard nofile 102400

Then I just reboot the instance.

Is that setting all right?

Regards,
Ye Zhou

On 2012/03/27, at 11:9 , Vineeth Mohan wrote:

Increasing the number of open files at OS level is highly recommended for
elasticSearch.
So please go ahead and do that and then start elasticSearch.

Thanks
Vineeth

On Tue, Mar 27, 2012 at 8:20 AM, Ye Zhou zhouy.vreal@gmail.com wrote:

Hi,

I'm currently writing a batch process which using amazon mapreduce and
the results will be stored in database and indexed into elasticsearch
cluster. The total amound of data will be less than 20k records. I've
started 2 servers in the cluster with 2 cores and 3.75G memory (Amazon
m1.medium instance) and I've also increate the max open file to 32000.

But during the process I've always get the "too many open files" error
and the Elasticsearch will stop serving for quite a while until I login to
the server and restart the elasticsearch daemon.

Does elasticsearch support massive indexing operation? And if not what
can I do to deal with the requirement?

Can anyone help me on this issue? Thanks.

Regards,
Ye Zhou