Warmers and IO

Hello all,

I'm wondering if there are any pointers you could give me for search
queries that return large result sets. The number of hits could be anywhere
from 10,000 - 2,000,000. As indicated by the slowlog, the consuming part is
fetching the data. I presume this fetching phase also includes sorting the
data?

The initial query invocation may take upwards of 5 minutes. If I initiate
the same subsequent query, it returns < 500ms.

Would warmers be an appropriate solution here?

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi,

Yes, if you know you will be running a query X and it is expensive or loads
some data that will be reused by other queries, or initializes some data
structures, and so on, then yes, using such a query for warming up is a
good idea.

Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Tuesday, February 12, 2013 3:11:45 PM UTC-5, Justin wrote:

Hello all,

I'm wondering if there are any pointers you could give me for search
queries that return large result sets. The number of hits could be anywhere
from 10,000 - 2,000,000. As indicated by the slowlog, the consuming part is
fetching the data. I presume this fetching phase also includes sorting the
data?

The initial query invocation may take upwards of 5 minutes. If I initiate
the same subsequent query, it returns < 500ms.

Would warmers be an appropriate solution here?

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks Otis

Unfortunately, we do not know the queries beforehand. BTW, they are text
queries.

We had to act fast. It may not be pretty but this is the current solution
we're using to "force" the data into the file system cache.

find /var/lib/elasticsearch/data -type f -exec cat {} ; > /dev/null

We've yet to see any queries taking > 500 ms - it seems to be working well.

Any thoughts on this approach?

On Tuesday, February 12, 2013 11:49:49 PM UTC-5, Otis Gospodnetic wrote:

Hi,

Yes, if you know you will be running a query X and it is expensive or
loads some data that will be reused by other queries, or initializes some
data structures, and so on, then yes, using such a query for warming up is
a good idea.

Otis

ELASTICSEARCH Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Tuesday, February 12, 2013 3:11:45 PM UTC-5, Justin wrote:

Hello all,

I'm wondering if there are any pointers you could give me for search
queries that return large result sets. The number of hits could be anywhere
from 10,000 - 2,000,000. As indicated by the slowlog, the consuming part is
fetching the data. I presume this fetching phase also includes sorting the
data?

The initial query invocation may take upwards of 5 minutes. If I initiate
the same subsequent query, it returns < 500ms.

Would warmers be an appropriate solution here?

Thanks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

We had to act fast. It may not be pretty but this is the current
solution we're using to "force" the data into the file system cache.

find /var/lib/elasticsearch/data -type f -exec cat {} ; > /dev/null

That's just part of it. You also need to build field and filter caches
inside ES.

The queries themselves are not cached, but there are bound to be filters
that you use regularly which you can know in advance, eg

{ "term": { "status": "active"}}
{ "date": { "range": { "from": "2013-01-0"}}}

Also any fields that you use in:

  • sorting
  • scripts
  • facets

need to be loaded into the field cache, so include those in your warmers
as well

clint

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Justin,

What Clinton said. Plus, this is a good old trick, but if your index is
larger than your RAM then it's only partially effective.

Otis
Solr & Elasticsearch Support

On Feb 13, 2013 1:18 AM, "Justin" tcpandip@gmail.com wrote:

Thanks Otis

Unfortunately, we do not know the queries beforehand. BTW, they are text
queries.

We had to act fast. It may not be pretty but this is the current solution
we're using to "force" the data into the file system cache.

find /var/lib/elasticsearch/data -type f -exec cat {} ; > /dev/null

We've yet to see any queries taking > 500 ms - it seems to be working
well.

Any thoughts on this approach?

On Tuesday, February 12, 2013 11:49:49 PM UTC-5, Otis Gospodnetic wrote:

Hi,

Yes, if you know you will be running a query X and it is expensive or
loads some data that will be reused by other queries, or initializes some
data structures, and so on, then yes, using such a query for warming up is
a good idea.

Otis

ELASTICSEARCH Performance Monitoring - http://sematext.com/spm/index.**
html http://sematext.com/spm/index.html

On Tuesday, February 12, 2013 3:11:45 PM UTC-5, Justin wrote:

Hello all,

I'm wondering if there are any pointers you could give me for search
queries that return large result sets. The number of hits could be anywhere
from 10,000 - 2,000,000. As indicated by the slowlog, the consuming part is
fetching the data. I presume this fetching phase also includes sorting the
data?

The initial query invocation may take upwards of 5 minutes. If I
initiate the same subsequent query, it returns < 500ms.

Would warmers be an appropriate solution here?

Thanks

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

A recommended method is

index.store.type: mmapfs

Your "find loop" loads everything from the data folder, but only once
for the "cat" process, and even the files that may be not used by your
elasticsearch workload. In contrast, mmapfs reads and keeps just the
relevant files in memory and continue to let the OS VM manage the cache.
Together with bootstrap.mlockall: true, such a page cache will stay
perfectly in RAM until eviction (mostly the exit of elasticsearch
process, assuming enough RAM), while your "find loop" loaded files will
only be loaded in RAM once before they are evicted, so the find command
would have to be repeated over and over again during the lifetime of the
ES process. And this would boggle down your overall system performance.

If you want another "no need to think" solution you could also use ZFS
with L2ARC (adaptive replacement cache, intro
http://dtrace.org/blogs/brendan/2008/07/22/zfs-l2arc/ ) where you don't
have to tinker with your resources like RAM, fs cache, disks, SSD and so
on - ZFS manages it for you.

Best regards,

Jörg

Am 13.02.13 07:18, schrieb Justin:

We had to act fast. It may not be pretty but this is the current
solution we're using to "force" the data into the file system cache.

find /var/lib/elasticsearch/data -type f -exec cat {} ; > /dev/null

We've yet to see any queries taking > 500 ms - it seems to be working
well.

Any thoughts on this approach?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.