High CPU Usage - indexing 100 docs per minute

arunmeena · October 24, 2013, 9:21pm

Hi,

We have a ES node on EC2 x-large instance and facing high CPU usage
problems.

Server has 4CPU, 16GB RAM but only 4g is assigned to ES

The data is being pushed in ES via Mongodb River, ES - 0.90.5, Mongo 2.4.6
with river version 1.7.1 The node has default settings and has 41 shards (1
for river, 5 shards per index - 8 index in total) - but only two of them
has high data load, other are as low as 1K docs.

100 docs are being indexed per minute which may be new or updates for
existing ones; the rate for indexing will increase in future so I am not
sure how it would even work? The system stops responding the whole time as
data is continously being indexed, I can even do a simple '{"match_all"}'
query on the any index on that node.

the gist for out put of "http:///_nodes/hot_threads":

gist.github.com

https://gist.github.com/arunmeena/7145098

_status

{
  "ok" : true,
  "_shards" : {
    "total" : 82,
    "successful" : 41,
    "failed" : 0
  },
  "indices" : {
    "_river" : {
      "index" : {

This file has been truncated. show original

hot-thread-1


68.0% CPU Usage by Thread 'elasticsearch[Centurious‎][search][T#4]'
  2/10 snapshots sharing following 30 elements
    sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:610)
    org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:176)
    org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:272)
    org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:138)
    org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:113)
    org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum$Frame.loadBlock(BlockTreeTermsReader.java:2375)
    org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.next(BlockTreeTermsReader.java:2148)

This file has been truncated. show original

hot-thread-2


55.0% CPU Usage by Thread 'elasticsearch[Centurious‎][search][T#11]'
  5/10 snapshots sharing following 24 elements
    org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader$SegmentTermsEnum.next(BlockTreeTermsReader.java:2148)
    org.apache.lucene.index.MultiTermsEnum.pushTop(MultiTermsEnum.java:292)
    org.apache.lucene.index.MultiTermsEnum.next(MultiTermsEnum.java:318)
    org.apache.lucene.index.FilteredTermsEnum.next(FilteredTermsEnum.java:233)
    org.apache.lucene.search.FuzzyTermsEnum.next(FuzzyTermsEnum.java:247)
    org.apache.lucene.sandbox.queries.FuzzyLikeThisQuery.addTerms(FuzzyLikeThisQuery.java:224)
    org.apache.lucene.sandbox.queries.FuzzyLikeThisQuery.rewrite(FuzzyLikeThisQuery.java:269)

This file has been truncated. show original

There are more than three files. show original

Please help

Thanks,
Arun

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

arunmeena · October 25, 2013, 1:35am

Can someone please provide some solution for the problem, we are trying to
push our product in production in few weeks and this issues would be
blocker?

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/High-CPU-Usage-indexing-100-docs-per-minute-tp4043147p4043153.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

arunmeena · October 25, 2013, 1:36am

Can someone please provide some solution for the problem, we are trying to
push our product in production in few weeks and this issues would be
blocker?

On Thursday, 24 October 2013 16:21:40 UTC-5, Arun Meena wrote:

Hi,

We have a ES node on EC2 x-large instance and facing high CPU usage
problems.

Server has 4CPU, 16GB RAM but only 4g is assigned to ES

The data is being pushed in ES via Mongodb River, ES - 0.90.5, Mongo 2.4.6
with river version 1.7.1 The node has default settings and has 41 shards (1
for river, 5 shards per index - 8 index in total) - but only two of them
has high data load, other are as low as 1K docs.

100 docs are being indexed per minute which may be new or updates for
existing ones; the rate for indexing will increase in future so I am not
sure how it would even work? The system stops responding the whole time as
data is continously being indexed, I can even do a simple '{"match_all"}'
query on the any index on that node.

the gist for out put of "http:///_nodes/hot_threads":
1. hot-thread output 2. node status "max_file_descriptors" : 65536 Server has 4CPU, 16GB RAM but only 1g is assigned to ES so far · GitHub

Please help

Thanks,
Arun

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

arunmeena · October 25, 2013, 12:53pm

someone kindly help.

arunmeena · October 25, 2013, 5:48pm

I was running Fuzzy query to find similar doc for each document. This was
causing the CPU to go as high as 99%.

Any ideas on how to implement it such that it requires less CPU.

Thanks,
Arun

On Thursday, 24 October 2013 16:21:40 UTC-5, Arun Meena wrote:

Hi,

We have a ES node on EC2 x-large instance and facing high CPU usage
problems.

Server has 4CPU, 16GB RAM but only 1g is assigned to ES
(problem is that I am not able to increase the MAX_MEM for ES, I tried
"export ES_HEAP_SIZE="2g"" - but it didn't work.)

The data is being pushed in ES via Mongodb River, ES - 0.90.5, Mongo 2.4.6
with river version 1.7.1 The node has default settings and has 41 shards (1
for river, 5 shards per index - 8 index in total) - but only two of them
has high data load, other are as low as 1K docs.

100 docs are being indexed per minute which may be new or updates for
existing ones; the rate for indexing will increase in future so I am not
sure how it would even work? The system stops responding the whole time as
data is continously being indexed, I can even do a simple '{"match_all"}'
query on the any index on that node.

the gist for out put of "http:///_nodes/hot_threads":
1. hot-thread output 2. node status "max_file_descriptors" : 65536 Server has 4CPU, 16GB RAM but only 1g is assigned to ES so far · GitHub

Please help

Thanks,
Arun

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Very high CPU usage of elastic nodes Elasticsearch	6	2559	March 29, 2018
High CPU consumption Elasticsearch	8	8632	July 5, 2017
High CPU usage due to certain stack trace Elasticsearch	1	627	June 25, 2019
ES High cpu issues Elasticsearch	11	1062	September 6, 2018
High CPU usage / load average while no running queries Elasticsearch	16	23108	February 5, 2019

High CPU Usage - indexing 100 docs per minute

Related topics