High disc utilization

Hello,
I have constant, close to 100%, disc utilization on server which is running
ElasticSearch.
What could be done to improve this?

Thank you

Hi,

How much RAM do you have? What -Xmx are you using? How big is your index
in terms of # of docs and on disk? Let OS have some more memory, don't be
greedy with -Xmx.

Otis

Performance Monitoring for Elasticsearch - Sematext Monitoring | Infrastructure Monitoring Service

On Saturday, May 12, 2012 12:13:38 AM UTC-4, Eugene Strokin wrote:

Hello,
I have constant, close to 100%, disc utilization on server which is
running Elasticsearch.
What could be done to improve this?

Thank you

The total of RAM is 8Gb. Currently Xmx is 2Gb. I have another application
which takes 2Gb as well, but it almost doesn't use disk resources.
My index total size is about 1Gb on disk, with around 1 million small (3-7
fields. the largest field is a string up to 256 characters) documents of 5
different types.
Usage of ES is queue extensive. I have around hundred searches per second
(very fast searches, not complex), and 1-5 indexes per second.
Whole index is on one physical box.
I was expecting that index could be almost totally cashed in memory (I
guess I could add more Xmx to ES if needed) and it would write to disc new
indexed documents periodically (I think default period is 3 seconds, so it
should write something to disk each 3 seconds).
But it looks like it constantly doing something. Utilization of disk is
close to 100% all the time.

Thank you

On Saturday, May 12, 2012 1:41:16 AM UTC-4, Otis Gospodnetic wrote:

Hi,

How much RAM do you have? What -Xmx are you using? How big is your index
in terms of # of docs and on disk? Let OS have some more memory, don't be
greedy with -Xmx.

Otis

Performance Monitoring for Elasticsearch - Sematext Monitoring | Infrastructure Monitoring Service

On Saturday, May 12, 2012 12:13:38 AM UTC-4, Eugene Strokin wrote:

Hello,
I have constant, close to 100%, disc utilization on server which is
running Elasticsearch.
What could be done to improve this?

Thank you

Hi Eugene,

On Saturday, May 12, 2012 1:01:42 PM UTC-4, Eugene Strokin wrote:

The total of RAM is 8Gb. Currently Xmx is 2Gb. I have another application
which takes 2Gb as well, but it almost doesn't use disk resources.

My index total size is about 1Gb on disk, with around 1 million small (3-7

fields. the largest field is a string up to 256 characters) documents of 5
different types.

Sounds nice and small, good! :slight_smile:
This index should all be cached by the OS.

Usage of ES is queue extensive. I have around hundred searches per second
(very fast searches, not complex), and 1-5 indexes per second.

Queue extensive? What do you mean by that?
Neither number is huge, that's good, too.

Whole index is on one physical box.

1 box, interesting.
Do you know how many shards your index has? Since only 1 box is in the
game you shouldn't shard at all. Default is 5.
And replicas are turned off, I imagine?

I was expecting that index could be almost totally cashed in memory (I
guess I could add more Xmx to ES if needed) and it would write to disc new
indexed documents periodically (I think default period is 3 seconds, so it
should write something to disk each 3 seconds).
But it looks like it constantly doing something. Utilization of disk is
close to 100% all the time.

Well, can you turn of indexing only and see if that makes a difference?
And if not, then turn off searching for a bit and see if that gets rid of
100% disk utilization?

And can you show us how you've determined your disk is 100% utilized?

iostat xdm 1 ?

Also, man pidstat.

Otis

Search Analytics - Cloud Monitoring Tools & Services | Sematext
Scalable Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

Thank you

On Saturday, May 12, 2012 1:41:16 AM UTC-4, Otis Gospodnetic wrote:

Hi,

How much RAM do you have? What -Xmx are you using? How big is your index
in terms of # of docs and on disk? Let OS have some more memory, don't be
greedy with -Xmx.

Otis

Performance Monitoring for Elasticsearch - Sematext Monitoring | Infrastructure Monitoring Service

On Saturday, May 12, 2012 12:13:38 AM UTC-4, Eugene Strokin wrote:

Hello,
I have constant, close to 100%, disc utilization on server which is
running Elasticsearch.
What could be done to improve this?

Thank you

Otis, thank you for the reply.
I was investigating the problem last several days, and apparently it wasn't
problem of ES at all.
I'll describe what I had, just in case someone would have such situation:
On the same box we stored files, links to which were in those 1M documents.
So we've got around 1M files on the same hard drive.
Linux has a process updatedb, which scans all files periodically. The
process was blocking the hard drive, not ES.
I'm not linux savvy, so it took me some time to figure out the right
command:
dstat -tc --top-io-adv --top-bio-adv --disk-util --disk-tps
Once I've reconfigured the cron job which runs the process, excluding
folder of those 1M files, everything starts working like a clock, nice and
fast.

But I'd ask you few more questions if you don't mind:
You said:

This index should all be cached by the OS.
You mean linux would just keep all files ES needs in memory automatically?
Without any special configuration?

Also, I still have shards, 5 of them, just because we expect the system to
grow at least 100 times. And once we'd start to see performance problems we
would be adding additional servers for the ES cluster.
The same reason we don't have replication yet. We didn't switch it off, we
just don't have second server to replicate to.
So, you said:

Queue extensive? What do you mean by that?
Neither number is huge, that's good, too.
I was thinking the usage is extensive already because, assuming that each
search takes 10 msec in average, 100 searches per second would already take
whole second.
I guess I calculated it wrong, because the server has multiple cores, so it
could support more searches.
If you have some statistics of how many searches one single box would
handle, please share, it would be very helpful to me, and probably to
others, to plan when and how to extend ES cluster.

Thank you.

On Sunday, May 13, 2012 9:29:08 PM UTC-4, Otis Gospodnetic wrote:

Hi Eugene,

On Saturday, May 12, 2012 1:01:42 PM UTC-4, Eugene Strokin wrote:

The total of RAM is 8Gb. Currently Xmx is 2Gb. I have another application
which takes 2Gb as well, but it almost doesn't use disk resources.

My index total size is about 1Gb on disk, with around 1 million small (3-7

fields. the largest field is a string up to 256 characters) documents of 5
different types.

Sounds nice and small, good! :slight_smile:
This index should all be cached by the OS.

Usage of ES is queue extensive. I have around hundred searches per second
(very fast searches, not complex), and 1-5 indexes per second.

Queue extensive? What do you mean by that?
Neither number is huge, that's good, too.

Whole index is on one physical box.

1 box, interesting.
Do you know how many shards your index has? Since only 1 box is in the
game you shouldn't shard at all. Default is 5.
And replicas are turned off, I imagine?

I was expecting that index could be almost totally cashed in memory (I
guess I could add more Xmx to ES if needed) and it would write to disc new
indexed documents periodically (I think default period is 3 seconds, so it
should write something to disk each 3 seconds).
But it looks like it constantly doing something. Utilization of disk is
close to 100% all the time.

Well, can you turn of indexing only and see if that makes a difference?
And if not, then turn off searching for a bit and see if that gets rid of
100% disk utilization?

And can you show us how you've determined your disk is 100% utilized?

iostat xdm 1 ?

Also, man pidstat.

Otis

Search Analytics - http://sematext.com/search-analytics/index.html
Scalable Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

Thank you

On Saturday, May 12, 2012 1:41:16 AM UTC-4, Otis Gospodnetic wrote:

Hi,

How much RAM do you have? What -Xmx are you using? How big is your
index in terms of # of docs and on disk? Let OS have some more memory,
don't be greedy with -Xmx.

Otis

Performance Monitoring for Elasticsearch - Sematext Monitoring | Infrastructure Monitoring Service

On Saturday, May 12, 2012 12:13:38 AM UTC-4, Eugene Strokin wrote:

Hello,
I have constant, close to 100%, disc utilization on server which is
running Elasticsearch.
What could be done to improve this?

Thank you

Hi Eugene,

Re your question about Linux and caching - yes, no special config needed.
Just don't steal all its RAM.

Otis

Search Analytics - Cloud Monitoring Tools & Services | Sematext
Scalable Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Monday, May 14, 2012 12:32:12 AM UTC-4, Eugene Strokin wrote:

Otis, thank you for the reply.
I was investigating the problem last several days, and apparently it
wasn't problem of ES at all.
I'll describe what I had, just in case someone would have such situation:
On the same box we stored files, links to which were in those 1M
documents. So we've got around 1M files on the same hard drive.
Linux has a process updatedb, which scans all files periodically. The
process was blocking the hard drive, not ES.
I'm not linux savvy, so it took me some time to figure out the right
command:
dstat -tc --top-io-adv --top-bio-adv --disk-util --disk-tps
Once I've reconfigured the cron job which runs the process, excluding
folder of those 1M files, everything starts working like a clock, nice and
fast.

But I'd ask you few more questions if you don't mind:
You said:

This index should all be cached by the OS.
You mean linux would just keep all files ES needs in memory automatically?
Without any special configuration?

Also, I still have shards, 5 of them, just because we expect the system to
grow at least 100 times. And once we'd start to see performance problems we
would be adding additional servers for the ES cluster.
The same reason we don't have replication yet. We didn't switch it off, we
just don't have second server to replicate to.
So, you said:

Queue extensive? What do you mean by that?
Neither number is huge, that's good, too.
I was thinking the usage is extensive already because, assuming that each
search takes 10 msec in average, 100 searches per second would already take
whole second.
I guess I calculated it wrong, because the server has multiple cores, so
it could support more searches.
If you have some statistics of how many searches one single box would
handle, please share, it would be very helpful to me, and probably to
others, to plan when and how to extend ES cluster.

Thank you.

On Sunday, May 13, 2012 9:29:08 PM UTC-4, Otis Gospodnetic wrote:

Hi Eugene,

On Saturday, May 12, 2012 1:01:42 PM UTC-4, Eugene Strokin wrote:

The total of RAM is 8Gb. Currently Xmx is 2Gb. I have another
application which takes 2Gb as well, but it almost doesn't use disk
resources.

My index total size is about 1Gb on disk, with around 1 million small

(3-7 fields. the largest field is a string up to 256 characters) documents
of 5 different types.

Sounds nice and small, good! :slight_smile:
This index should all be cached by the OS.

Usage of ES is queue extensive. I have around hundred searches per
second (very fast searches, not complex), and 1-5 indexes per second.

Queue extensive? What do you mean by that?
Neither number is huge, that's good, too.

Whole index is on one physical box.

1 box, interesting.
Do you know how many shards your index has? Since only 1 box is in the
game you shouldn't shard at all. Default is 5.
And replicas are turned off, I imagine?

I was expecting that index could be almost totally cashed in memory (I
guess I could add more Xmx to ES if needed) and it would write to disc new
indexed documents periodically (I think default period is 3 seconds, so it
should write something to disk each 3 seconds).
But it looks like it constantly doing something. Utilization of disk is
close to 100% all the time.

Well, can you turn of indexing only and see if that makes a difference?
And if not, then turn off searching for a bit and see if that gets rid of
100% disk utilization?

And can you show us how you've determined your disk is 100% utilized?

iostat xdm 1 ?

Also, man pidstat.

Otis

Search Analytics - Cloud Monitoring Tools & Services | Sematext
Scalable Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

Thank you

On Saturday, May 12, 2012 1:41:16 AM UTC-4, Otis Gospodnetic wrote:

Hi,

How much RAM do you have? What -Xmx are you using? How big is your
index in terms of # of docs and on disk? Let OS have some more memory,
don't be greedy with -Xmx.

Otis

Performance Monitoring for Elasticsearch - http://sematext.com/spm

On Saturday, May 12, 2012 12:13:38 AM UTC-4, Eugene Strokin wrote:

Hello,
I have constant, close to 100%, disc utilization on server which is
running Elasticsearch.
What could be done to improve this?

Thank you