ES slowness and calls taking a long time to return

T_Vinod_Gupta · July 7, 2012, 12:05am

Hi,
I have a single node ES (0.18.7 version) setup. unfortunately, i didn't
change the default config much and it has 5 shards.. and now i have quite a
bit of production data stored on it (12GB). what we are seeing is reduced
throughput over time and search times sometimes as high as few minutes.. im
looking at some help on how to bring the situation under control as we are
constantly indexing data and also serving realtime customer requests.

questions

is it possible to reduce the number of shards from 5 to 2 somehow? does
that work once the system is already in place?
i read somewhere that it could be due to threadpool pressure. but the
node stats ( curl -XGET '
http://localhost:9200/_clustr/nodes/stats?pretty=true') is not giving
thread pool information. how do i get around identifying the root cause?
my throughput is around 300-400 index calls per sec. how do i make it
higher?
if i were to optimize such that my gets and search calls are faster, is
it possible? it can be at the expense of slower index calls.

this is on a dual core machine (ec2 m1.large instance) and i gave ES 4GB
ram. has there been any benchmarking done on ec2 instances.

let me know if any further info is needed.

thanks

Radu_Gheorghe1 · July 7, 2012, 3:08pm

Hi,

On Saturday, July 7, 2012 3:05:03 AM UTC+3, T Vinod Gupta wrote:

Hi,
I have a single node ES (0.18.7 version) setup. unfortunately, i didn't
change the default config much and it has 5 shards.. and now i have quite a
bit of production data stored on it (12GB). what we are seeing is reduced
throughput over time and search times sometimes as high as few minutes.. im
looking at some help on how to bring the situation under control as we are
constantly indexing data and also serving realtime customer requests.

questions

is it possible to reduce the number of shards from 5 to 2 somehow? does
that work once the system is already in place?

i read somewhere that it could be due to threadpool pressure. but the
node stats ( curl -XGET '
http://localhost:9200/_clustr/nodes/stats?pretty=true') is not giving
thread pool information. how do i get around identifying the root cause?

I would start by looking at BigDesk and in the logs.

my throughput is around 300-400 index calls per sec. how do i make it
higher?

It depends a lot on how you data looks like. But increasing the
refresh_interval should always help.

if i were to optimize such that my gets and search calls are faster, is
it possible? it can be at the expense of slower index calls.

How does your data and searches look like?

If you find your storage slow, you might benefit from compressing your
source. I would also try upgrading ES to a newer version. I find it faster,
although I don't have a clear benchmark to show that. Please note that
upgrading needs some care. Quote:
Upgrade Notes:

Upgrading from 0.18 requires issuing a full flush of all the indices
in the cluster (curl host:9200/_flush) before shutting down the cluster,
with no indexing operations happening after the flush.
The local gateway state structure has changed, a backup of the state
files is created when upgrading, they can then be used to downgrade back to
0.18. Don’t downgrade without using them.

this is on a dual core machine (ec2 m1.large instance) and i gave ES 4GB
ram. has there been any benchmarking done on ec2 instances.

let me know if any further info is needed.

thanks

T_Vinod_Gupta · July 9, 2012, 2:19am

Thanks Radu..
I increased the refresh interval to 60sec.. that didnt help.. i see bunch
of error messages in elasticsearch.log file that look like below. could
that be the reason for slow search? now i see slowness even when there is
not much indexing happening. these messages occur twice/thrice a minute.

[2012-07-09 00:00:55,238][WARN ][index.merge.scheduler ] [] [facebook][3] failed to merge
java.io.IOException: Input/output error:
NIOFSIndexInput(path="/media/ephemeral0/ES_data/elasticsearch/nodes/0/indices/facebook/3/index/_qicw.fdt")
at
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:180)
at
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:155)
at
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:110)
at org.apache.lucene.store.DataOutput.copyBytes(DataOutput.java:123)
at
org.apache.lucene.index.FieldsWriter.addRawDocuments(FieldsWriter.java:216)
at
org.apache.lucene.index.SegmentMerger.copyFieldsWithDeletions(SegmentMerger.java:301)
at
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:248)
at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108)
at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4295)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3940)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:88)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)
Caused by: java.io.IOException: Input/output error
at sun.nio.ch.FileDispatcher.pread0(Native Method)
at sun.nio.ch.FileDispatcher.pread(FileDispatcher.java:49)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:248)
at sun.nio.ch.IOUtil.read(IOUtil.java:224)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:663)
at
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:162)
... 12 more

also, i was not able to install bigdesk - due to below error.

sudo bin/plugin -install lukas-vlcek/bigdesk/1.0.0
-> Installing lukas-vlcek/bigdesk/1.0.0...
Trying https://github.com/downloads/lukas-vlcek/bigdesk/bigdesk-1.0.0.zip...
Trying https://github.com/lukas-vlcek/bigdesk/zipball/v1.0.0...
Failed to install lukas-vlcek/bigdesk/1.0.0, reason: failed to download

i would really appreciate any help i can get here..

thanks

On Sat, Jul 7, 2012 at 8:08 AM, Radu Gheorghe radu0gheorghe@gmail.comwrote:

Hi,

On Saturday, July 7, 2012 3:05:03 AM UTC+3, T Vinod Gupta wrote:

Hi,
I have a single node ES (0.18.7 version) setup. unfortunately, i didn't
change the default config much and it has 5 shards.. and now i have quite a
bit of production data stored on it (12GB). what we are seeing is reduced
throughput over time and search times sometimes as high as few minutes.. im
looking at some help on how to bring the situation under control as we are
constantly indexing data and also serving realtime customer requests.

questions

is it possible to reduce the number of shards from 5 to 2 somehow?
does that work once the system is already in place?

i read somewhere that it could be due to threadpool pressure. but the
node stats ( curl -XGET 'http://localhost:9200/_**
clustr/nodes/stats?pretty=truehttp://localhost:9200/_clustr/nodes/stats?pretty=true
**') is not giving thread pool information. how do i get around
identifying the root cause?

I would start by looking at BigDesk and in the logs.

my throughput is around 300-400 index calls per sec. how do i make it
higher?

It depends a lot on how you data looks like. But increasing the
refresh_interval should always help.

if i were to optimize such that my gets and search calls are faster,
is it possible? it can be at the expense of slower index calls.

How does your data and searches look like?

If you find your storage slow, you might benefit from compressing your
source. I would also try upgrading ES to a newer version. I find it faster,
although I don't have a clear benchmark to show that. Please note that
upgrading needs some care. Quote:
Upgrade Notes:

Upgrading from 0.18 requires issuing a full flush of all the indices
in the cluster (curl host:9200/_flush) before shutting down the
cluster, with no indexing operations happening after the flush.

The local gateway state structure has changed, a backup of the state
files is created when upgrading, they can then be used to downgrade back to
0.18. Don’t downgrade without using them.

this is on a dual core machine (ec2 m1.large instance) and i gave ES 4GB
ram. has there been any benchmarking done on ec2 instances.

let me know if any further info is needed.

thanks

Radu_Gheorghe1 · July 9, 2012, 6:40am

On Monday, July 9, 2012 5:19:06 AM UTC+3, T Vinod Gupta wrote:

Thanks Radu..
I increased the refresh interval to 60sec.. that didnt help.. i see bunch
of error messages in elasticsearch.log file that look like below. could
that be the reason for slow search? now i see slowness even when there is
not much indexing happening. these messages occur twice/thrice a minute.

I don't know what that error means, besides from what the text says (read
error). And I don't know how that would impact performance. I mean, it must
have a performance impact, I just don't know how significant it is.

[2012-07-09 00:00:55,238][WARN ][index.merge.scheduler ] [] [facebook][3] failed to merge
java.io.IOException: Input/output error:
NIOFSIndexInput(path="/media/ephemeral0/ES_data/elasticsearch/nodes/0/indices/facebook/3/index/_qicw.fdt")
at
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:180)
at
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:155)
at
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:110)
at
org.apache.lucene.store.DataOutput.copyBytes(DataOutput.java:123)
at
org.apache.lucene.index.FieldsWriter.addRawDocuments(FieldsWriter.java:216)
at
org.apache.lucene.index.SegmentMerger.copyFieldsWithDeletions(SegmentMerger.java:301)
at
org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:248)
at
org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:108)
at
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4295)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3940)
at
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:388)
at
org.apache.lucene.index.TrackingConcurrentMergeScheduler.doMerge(TrackingConcurrentMergeScheduler.java:88)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:456)
Caused by: java.io.IOException: Input/output error
at sun.nio.ch.FileDispatcher.pread0(Native Method)
at sun.nio.ch.FileDispatcher.pread(FileDispatcher.java:49)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:248)
at sun.nio.ch.IOUtil.read(IOUtil.java:224)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:663)
at
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:162)
... 12 more

also, i was not able to install bigdesk - due to below error.

sudo bin/plugin -install lukas-vlcek/bigdesk/1.0.0
-> Installing lukas-vlcek/bigdesk/1.0.0...
Trying https://github.com/downloads/lukas-vlcek/bigdesk/bigdesk-1.0.0.zip.
..
Trying https://github.com/lukas-vlcek/bigdesk/zipball/v1.0.0...
Failed to install lukas-vlcek/bigdesk/1.0.0, reason: failed to download

Do you use a proxy or something?

Anyway, I think you can just download it from here:
https://github.com/lukas-vlcek/bigdesk/zipball/0.18.x

then extract it and open index.html from the lukas-vlcek... directory.

i would really appreciate any help i can get here..

thanks

On Sat, Jul 7, 2012 at 8:08 AM, Radu Gheorghe <> wrote:

Hi,

On Saturday, July 7, 2012 3:05:03 AM UTC+3, T Vinod Gupta wrote:

Hi,
I have a single node ES (0.18.7 version) setup. unfortunately, i didn't
change the default config much and it has 5 shards.. and now i have quite a
bit of production data stored on it (12GB). what we are seeing is reduced
throughput over time and search times sometimes as high as few minutes.. im
looking at some help on how to bring the situation under control as we are
constantly indexing data and also serving realtime customer requests.

questions

is it possible to reduce the number of shards from 5 to 2 somehow?
does that work once the system is already in place?

i read somewhere that it could be due to threadpool pressure. but the
node stats ( curl -XGET 'http://localhost:9200/_**
clustr/nodes/stats?pretty=truehttp://localhost:9200/_clustr/nodes/stats?pretty=true
**') is not giving thread pool information. how do i get around
identifying the root cause?

I would start by looking at BigDesk and in the logs.

my throughput is around 300-400 index calls per sec. how do i make it
higher?

It depends a lot on how you data looks like. But increasing the
refresh_interval should always help.

if i were to optimize such that my gets and search calls are faster,
is it possible? it can be at the expense of slower index calls.

How does your data and searches look like?

If you find your storage slow, you might benefit from compressing your
source. I would also try upgrading ES to a newer version. I find it faster,
although I don't have a clear benchmark to show that. Please note that
upgrading needs some care. Quote:
Upgrade Notes:

Upgrading from 0.18 requires issuing a full flush of all the
indices in the cluster (curl host:9200/_flush) before shutting down
the cluster, with no indexing operations happening after the flush.

The local gateway state structure has changed, a backup of the
state files is created when upgrading, they can then be used to downgrade
back to 0.18. Don’t downgrade without using them.

this is on a dual core machine (ec2 m1.large instance) and i gave ES 4GB
ram. has there been any benchmarking done on ec2 instances.

let me know if any further info is needed.

thanks

jprante · July 9, 2012, 7:22am

Hi,

such errors like below are outside of ES but should be interpreted as a
serious hint that the system is not able to write via NIO because of disk
errors, file system errors, short on resources etc. So I'd watch out for
messages of the system (syslog, disk damages, disk full, etc.)

Best regards,

Jörg

Quote:
[...]
Caused by: java.io.IOException: Input/output error
at sun.nio.ch.FileDispatcher.pread0(Native Method)
at sun.nio.ch.FileDispatcher.pread(FileDispatcher.java:49)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:248)
at sun.nio.ch.IOUtil.read(IOUtil.java:224)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:663)
at
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:162)

Topic		Replies	Views
Slow search response time (low CPU utilization) Elasticsearch	7	3436	July 31, 2019
EC2 Perfomance problems, advice needed Elasticsearch	19	490	July 6, 2017
Suddenly slow on EC2 Elasticsearch	9	398	July 6, 2017
Overall cluster performance is relation to number of shards Elasticsearch	2	618	June 9, 2018
Lower Shard Count Significantly slows Search Performance Elasticsearch	5	627	May 9, 2017

ES slowness and calls taking a long time to return

Related topics