High CPU Usage - indexing 100 docs per minute

Hi,

We have a ES node on EC2 x-large instance and facing high CPU usage
problems.

Server has 4CPU, 16GB RAM but only 4g is assigned to ES

The data is being pushed in ES via Mongodb River, ES - 0.90.5, Mongo 2.4.6
with river version 1.7.1 The node has default settings and has 41 shards (1
for river, 5 shards per index - 8 index in total) - but only two of them
has high data load, other are as low as 1K docs.

100 docs are being indexed per minute which may be new or updates for
existing ones; the rate for indexing will increase in future so I am not
sure how it would even work? The system stops responding the whole time as
data is continously being indexed, I can even do a simple '{"match_all"}'
query on the any index on that node.

the gist for out put of "http:///_nodes/hot_threads":

Please help

Thanks,
Arun

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Can someone please provide some solution for the problem, we are trying to
push our product in production in few weeks and this issues would be
blocker?

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/High-CPU-Usage-indexing-100-docs-per-minute-tp4043147p4043153.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Can someone please provide some solution for the problem, we are trying to
push our product in production in few weeks and this issues would be
blocker?

On Thursday, 24 October 2013 16:21:40 UTC-5, Arun Meena wrote:

Hi,

We have a ES node on EC2 x-large instance and facing high CPU usage
problems.

Server has 4CPU, 16GB RAM but only 4g is assigned to ES

The data is being pushed in ES via Mongodb River, ES - 0.90.5, Mongo 2.4.6
with river version 1.7.1 The node has default settings and has 41 shards (1
for river, 5 shards per index - 8 index in total) - but only two of them
has high data load, other are as low as 1K docs.

100 docs are being indexed per minute which may be new or updates for
existing ones; the rate for indexing will increase in future so I am not
sure how it would even work? The system stops responding the whole time as
data is continously being indexed, I can even do a simple '{"match_all"}'
query on the any index on that node.

the gist for out put of "http:///_nodes/hot_threads":
1. hot-thread output 2. node status "max_file_descriptors" : 65536 Server has 4CPU, 16GB RAM but only 1g is assigned to ES so far · GitHub

Please help

Thanks,
Arun

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

someone kindly help.

I was running Fuzzy query to find similar doc for each document. This was
causing the CPU to go as high as 99%.

Any ideas on how to implement it such that it requires less CPU.

Thanks,
Arun

On Thursday, 24 October 2013 16:21:40 UTC-5, Arun Meena wrote:

Hi,

We have a ES node on EC2 x-large instance and facing high CPU usage
problems.

Server has 4CPU, 16GB RAM but only 1g is assigned to ES
(problem is that I am not able to increase the MAX_MEM for ES, I tried
"export ES_HEAP_SIZE="2g"" - but it didn't work.)

The data is being pushed in ES via Mongodb River, ES - 0.90.5, Mongo 2.4.6
with river version 1.7.1 The node has default settings and has 41 shards (1
for river, 5 shards per index - 8 index in total) - but only two of them
has high data load, other are as low as 1K docs.

100 docs are being indexed per minute which may be new or updates for
existing ones; the rate for indexing will increase in future so I am not
sure how it would even work? The system stops responding the whole time as
data is continously being indexed, I can even do a simple '{"match_all"}'
query on the any index on that node.

the gist for out put of "http:///_nodes/hot_threads":
1. hot-thread output 2. node status "max_file_descriptors" : 65536 Server has 4CPU, 16GB RAM but only 1g is assigned to ES so far · GitHub

Please help

Thanks,
Arun

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.