Dealing with OS page cache evictions?


(Andrew White) #1
One of the biggest performance problems we have with ES is page cache
evictions. Indexer warmers are great for getting the cache primed but it
doesn't take much for the cache to be trashed. Some common causes for us
are, logging, backups, and HDFS tasks. Relying completely on the OS page
cache has always seemed odd to me since it's almost completely out of the
applications control, and very inconsistent across platforms, and with very
few configuration optional available.

What are our options for ensuring that performance doesn't degrade because
of disk IO caused by other applications?

Some thoughts I've had...

1) Call fadvice with POSIX_FADV_WILLNEED after warmers are called
2) Shell out to vmtouch after warmers.
3) Configure ES/Lucene to use a heaped based cache (does this even exist?)

I don't know if any of these are possible and I certainly don't know how to
do them out of the box.

Any advice?

Thanks,
Andrew White

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/96bd85ce-6325-4a02-ae88-e7b9ae4ddf7c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

(Itamar Syn-Hershko) #2
Simple: never share an Elasticsearch data node with other intensive
applications with intensive IO, CPU or memory consumption.

Also, Elasticsearch's recommended configuration talks about setting a heap
size of 50% of the available memory on the machine, and avoiding
swappiness. So for most common use cases, if you do sizing right, you
shouldn't be hitting the disk so much (except from _source loading ,
highlighting or if you are using docvalues)

--

Itamar Syn-Hershko
http://code972.com | @synhershko
Freelance Developer & Consultant
Lucene.NET committer and PMC member

On Wed, Feb 4, 2015 at 2:50 PM, Andrew White wrote:

One of the biggest performance problems we have with ES is page cache
evictions. Indexer warmers are great for getting the cache primed but it
doesn't take much for the cache to be trashed. Some common causes for us
are, logging, backups, and HDFS tasks. Relying completely on the OS page
cache has always seemed odd to me since it's almost completely out of the
applications control, and very inconsistent across platforms, and with very
few configuration optional available.

What are our options for ensuring that performance doesn't degrade because
of disk IO caused by other applications?

Some thoughts I've had...

1) Call fadvice with POSIX_FADV_WILLNEED after warmers are called
2) Shell out to vmtouch after warmers.
3) Configure ES/Lucene to use a heaped based cache (does this even exist?)

I don't know if any of these are possible and I certainly don't know how
to do them out of the box.

Any advice?

Thanks,
Andrew White

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/96bd85ce-6325-4a02-ae88-e7b9ae4ddf7c%40googlegroups.com

.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsH5i9%2B%2BqMdU6PNWjq82e43zXTMJ30mfoQ_AuvRbcCE_g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

(Andrew White) #3
Sure, dedicated hardward is a solution but it's rather unreasonable in
certain use cases especially when software solutions could exist. We
already have a cluster with plenty of RAM and CPU to spare and the ES
shards + routing ensure each request is handled by very few nodes. The
setup of the cluster also allows for very fast distributed indexing and
scanning.

Trying to justify all new hardware for erratically slowed searches on
existing hardware that is already correctly sized is hard to follow.
Especially when we're talking about indexing billions of documents that
make use of heavy aggregation.

So assuming we have correctly size hardware, how do we solve this? A
software cache at the lucene/ES level seems to clearly solve it, does that
exist? Is OS page cache locking possible (like vmtouch does?)

On Wed, Feb 4, 2015 at 7:07 AM, Itamar Syn-Hershko
wrote:

Simple: never share an Elasticsearch data node with other intensive
applications with intensive IO, CPU or memory consumption.

Also, Elasticsearch's recommended configuration talks about setting a heap
size of 50% of the available memory on the machine, and avoiding
swappiness. So for most common use cases, if you do sizing right, you
shouldn't be hitting the disk so much (except from _source loading ,
highlighting or if you are using docvalues)

--

Itamar Syn-Hershko
http://code972.com | @synhershko
Freelance Developer & Consultant
Lucene.NET committer and PMC member

On Wed, Feb 4, 2015 at 2:50 PM, Andrew White wrote:

One of the biggest performance problems we have with ES is page cache
evictions. Indexer warmers are great for getting the cache primed but it
doesn't take much for the cache to be trashed. Some common causes for us
are, logging, backups, and HDFS tasks. Relying completely on the OS page
cache has always seemed odd to me since it's almost completely out of the
applications control, and very inconsistent across platforms, and with very
few configuration optional available.

What are our options for ensuring that performance doesn't degrade
because of disk IO caused by other applications?

Some thoughts I've had...

1) Call fadvice with POSIX_FADV_WILLNEED after warmers are called
2) Shell out to vmtouch after warmers.
3) Configure ES/Lucene to use a heaped based cache (does this even exist?)

I don't know if any of these are possible and I certainly don't know how
to do them out of the box.

Any advice?

Thanks,
Andrew White

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/96bd85ce-6325-4a02-ae88-e7b9ae4ddf7c%40googlegroups.com

.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/9rMw0K8po-w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsH5i9%2B%2BqMdU6PNWjq82e43zXTMJ30mfoQ_AuvRbcCE_g%40mail.gmail.com

.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGMXF-dPvDM9SQF6OajfMgHiXvBOqKRqfxAwT-62fgTws-Xk-A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

(jamesma) #4
Have you looked into filtered cache or query cache on ES? You can increase
the default filter cache from 10% and enable query cache to see if this
will help.

On Wednesday, February 4, 2015 at 6:34:16 AM UTC-8, Andrew White wrote:

Sure, dedicated hardward is a solution but it's rather unreasonable in
certain use cases especially when software solutions could exist. We
already have a cluster with plenty of RAM and CPU to spare and the ES
shards + routing ensure each request is handled by very few nodes. The
setup of the cluster also allows for very fast distributed indexing and
scanning.

Trying to justify all new hardware for erratically slowed searches on
existing hardware that is already correctly sized is hard to follow.
Especially when we're talking about indexing billions of documents that
make use of heavy aggregation.

So assuming we have correctly size hardware, how do we solve this? A
software cache at the lucene/ES level seems to clearly solve it, does that
exist? Is OS page cache locking possible (like vmtouch does?)

On Wed, Feb 4, 2015 at 7:07 AM, Itamar Syn-Hershko > wrote:

Simple: never share an Elasticsearch data node with other intensive
applications with intensive IO, CPU or memory consumption.

Also, Elasticsearch's recommended configuration talks about setting a
heap size of 50% of the available memory on the machine, and avoiding
swappiness. So for most common use cases, if you do sizing right, you
shouldn't be hitting the disk so much (except from _source loading ,
highlighting or if you are using docvalues)

--

Itamar Syn-Hershko
http://code972.com | @synhershko
Freelance Developer & Consultant
Lucene.NET committer and PMC member

On Wed, Feb 4, 2015 at 2:50 PM, Andrew White > wrote:

One of the biggest performance problems we have with ES is page cache
evictions. Indexer warmers are great for getting the cache primed but it
doesn't take much for the cache to be trashed. Some common causes for us
are, logging, backups, and HDFS tasks. Relying completely on the OS page
cache has always seemed odd to me since it's almost completely out of the
applications control, and very inconsistent across platforms, and with very
few configuration optional available.

What are our options for ensuring that performance doesn't degrade
because of disk IO caused by other applications?

Some thoughts I've had...

1) Call fadvice with POSIX_FADV_WILLNEED after warmers are called
2) Shell out to vmtouch after warmers.
3) Configure ES/Lucene to use a heaped based cache (does this even
exist?)

I don't know if any of these are possible and I certainly don't know how
to do them out of the box.

Any advice?

Thanks,
Andrew White

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com .
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/96bd85ce-6325-4a02-ae88-e7b9ae4ddf7c%40googlegroups.com

.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/9rMw0K8po-w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com .
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZsH5i9%2B%2BqMdU6PNWjq82e43zXTMJ30mfoQ_AuvRbcCE_g%40mail.gmail.com

.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/143e55ea-8550-4a36-85dc-1336989ed92b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

(Andrew White) #5
According to the doc...

"For now, the query cache will only cache the results of search requests
where ?search_type=count, so it will not cache hits"

And the filter cache is just a bitset, not the expensive IO seeks related
to source fetching. The issue we have is with the fetching of the docs from
disk. From what I can see, ES simply does not cache docs.

On Fri, Feb 20, 2015 at 12:16 PM, Jay wrote:

Have you looked into filtered cache or query cache on ES? You can increase
the default filter cache from 10% and enable query cache to see if this
will help.

On Wednesday, February 4, 2015 at 6:34:16 AM UTC-8, Andrew White wrote:

Sure, dedicated hardward is a solution but it's rather unreasonable in
certain use cases especially when software solutions could exist. We
already have a cluster with plenty of RAM and CPU to spare and the ES
shards + routing ensure each request is handled by very few nodes. The
setup of the cluster also allows for very fast distributed indexing and
scanning.

Trying to justify all new hardware for erratically slowed searches on
existing hardware that is already correctly sized is hard to follow.
Especially when we're talking about indexing billions of documents that
make use of heavy aggregation.

So assuming we have correctly size hardware, how do we solve this? A
software cache at the lucene/ES level seems to clearly solve it, does that
exist? Is OS page cache locking possible (like vmtouch does?)

On Wed, Feb 4, 2015 at 7:07 AM, Itamar Syn-Hershko
wrote:

Simple: never share an Elasticsearch data node with other intensive
applications with intensive IO, CPU or memory consumption.

Also, Elasticsearch's recommended configuration talks about setting a
heap size of 50% of the available memory on the machine, and avoiding
swappiness. So for most common use cases, if you do sizing right, you
shouldn't be hitting the disk so much (except from _source loading ,
highlighting or if you are using docvalues)

--

Itamar Syn-Hershko
http://code972.com | @synhershko
Freelance Developer & Consultant
Lucene.NET committer and PMC member

On Wed, Feb 4, 2015 at 2:50 PM, Andrew White
wrote:

One of the biggest performance problems we have with ES is page cache
evictions. Indexer warmers are great for getting the cache primed but it
doesn't take much for the cache to be trashed. Some common causes for us
are, logging, backups, and HDFS tasks. Relying completely on the OS page
cache has always seemed odd to me since it's almost completely out of the
applications control, and very inconsistent across platforms, and with very
few configuration optional available.

What are our options for ensuring that performance doesn't degrade
because of disk IO caused by other applications?

Some thoughts I've had...

1) Call fadvice with POSIX_FADV_WILLNEED after warmers are called
2) Shell out to vmtouch after warmers.
3) Configure ES/Lucene to use a heaped based cache (does this even
exist?)

I don't know if any of these are possible and I certainly don't know
how to do them out of the box.

Any advice?

Thanks,
Andrew White

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/96bd85ce-6325-4a02-ae88-e7b9ae4ddf7c%
40googlegroups.com

.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/9rMw0K8po-w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAHTr4ZsH5i9%2B%2BqMdU6PNWjq82e43zXTMJ30mfoQ_
AuvRbcCE_g%40mail.gmail.com

.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/9rMw0K8po-w/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/143e55ea-8550-4a36-85dc-1336989ed92b%40googlegroups.com

.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGMXF-eqLkcBSxpA35BbcrA6kG383MhUFZtSogVNc1cCcC8vUQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

(system) #6