ES performance questions

Hi,

Some co-workers and myself have been testing out ES recently. I wanted
to ask about some observations we've made.

Test server: 2x core i5, 4 GB RAM, 5400 RPM hard disk, Mac OS X Lion

We attempted to insert 2 million documents with a name, description,
and score. The index size was about 3 gigs.

Doing a search with a query on description for two common words
(common words are in about 10% of documents) takes about 6-7 seconds
to return 500 results, ordered by score. It takes about 2 seconds to
return the top 50 results.

Given the specs of the computer running the search, this doesn't seem
terrible. But, when running a search, we notice that ElasticSearch is
using just a few % CPU time, and less than 150 MB of RAM, even though
much more is available.

That behavior makes me think the query latency is mainly time to read
from the hard disk. But, I'm curious why ES isn't trying to use more
RAM to make the query faster.

I'm wondering if this all sounds normal, and whether there's anything
we can do to optimize this particular type of search. We changed the
index mapping to store the name, description, and score. In this case,
we don't care about the total number of matches found, if that makes a
difference.

Thanks,

Chris

Hey Chris,

What do your java command line options look like? Also, did you check to
see how much I/O it was doing at the time? Did you vary the search, or were
you searching for similar/the same terms?

Patrick

patrick eefy net

On Mon, Dec 12, 2011 at 8:55 PM, Chris Scribner scriby@gmail.com wrote:

Hi,

Some co-workers and myself have been testing out ES recently. I wanted
to ask about some observations we've made.

Test server: 2x core i5, 4 GB RAM, 5400 RPM hard disk, Mac OS X Lion

We attempted to insert 2 million documents with a name, description,
and score. The index size was about 3 gigs.

Doing a search with a query on description for two common words
(common words are in about 10% of documents) takes about 6-7 seconds
to return 500 results, ordered by score. It takes about 2 seconds to
return the top 50 results.

Given the specs of the computer running the search, this doesn't seem
terrible. But, when running a search, we notice that Elasticsearch is
using just a few % CPU time, and less than 150 MB of RAM, even though
much more is available.

That behavior makes me think the query latency is mainly time to read
from the hard disk. But, I'm curious why ES isn't trying to use more
RAM to make the query faster.

I'm wondering if this all sounds normal, and whether there's anything
we can do to optimize this particular type of search. We changed the
index mapping to store the name, description, and score. In this case,
we don't care about the total number of matches found, if that makes a
difference.

Thanks,

Chris

We tried passing -Xmx2g. We didn't check I/O stats on the process.

We varied the search terms each query. Searching for the same search
terms is very quick as expected (but not a particularly important use
case for us).

Chris

On Dec 12, 9:34 pm, Patrick patr...@eefy.net wrote:

Hey Chris,

What do your java command line options look like? Also, did you check to
see how much I/O it was doing at the time? Did you vary the search, or were
you searching for similar/the same terms?

Patrick
----------------------------------------Patrick Ancillotti - New York | about.me
patrick eefy net

On Mon, Dec 12, 2011 at 8:55 PM, Chris Scribner scr...@gmail.com wrote:

Hi,

Some co-workers and myself have been testing out ES recently. I wanted
to ask about some observations we've made.

Test server: 2x core i5, 4 GB RAM, 5400 RPM hard disk, Mac OS X Lion

We attempted to insert 2 million documents with a name, description,
and score. The index size was about 3 gigs.

Doing a search with a query on description for two common words
(common words are in about 10% of documents) takes about 6-7 seconds
to return 500 results, ordered by score. It takes about 2 seconds to
return the top 50 results.

Given the specs of the computer running the search, this doesn't seem
terrible. But, when running a search, we notice that Elasticsearch is
using just a few % CPU time, and less than 150 MB of RAM, even though
much more is available.

That behavior makes me think the query latency is mainly time to read
from the hard disk. But, I'm curious why ES isn't trying to use more
RAM to make the query faster.

I'm wondering if this all sounds normal, and whether there's anything
we can do to optimize this particular type of search. We changed the
index mapping to store the name, description, and score. In this case,
we don't care about the total number of matches found, if that makes a
difference.

Thanks,

Chris

Did you let caches warm up before doing timings? As with all perf tests,
you should always let both the JVM (hotspot) warm up as well as all the ES
good stuff.

Something else you can do would be to add more boxes to distribute the
queries. If disk I/O is your bottleneck, sharing this load will speed
things up. Especially if your data is ~3gb but your memory is only 2gb,
maybe not all your data is fitting in memory. If every query is requiring
disk access then you need more RAM - either on that one machine or by
adding more machines.

Paul.

On Mon, Dec 12, 2011 at 6:53 PM, Chris Scribner scriby@gmail.com wrote:

We tried passing -Xmx2g. We didn't check I/O stats on the process.

We varied the search terms each query. Searching for the same search
terms is very quick as expected (but not a particularly important use
case for us).

Chris

On Dec 12, 9:34 pm, Patrick patr...@eefy.net wrote:

Hey Chris,

What do your java command line options look like? Also, did you check to
see how much I/O it was doing at the time? Did you vary the search, or
were
you searching for similar/the same terms?

Patrick

Patrick Ancillotti - New York | about.me

patrick eefy net

On Mon, Dec 12, 2011 at 8:55 PM, Chris Scribner scr...@gmail.com
wrote:

Hi,

Some co-workers and myself have been testing out ES recently. I wanted
to ask about some observations we've made.

Test server: 2x core i5, 4 GB RAM, 5400 RPM hard disk, Mac OS X Lion

We attempted to insert 2 million documents with a name, description,
and score. The index size was about 3 gigs.

Doing a search with a query on description for two common words
(common words are in about 10% of documents) takes about 6-7 seconds
to return 500 results, ordered by score. It takes about 2 seconds to
return the top 50 results.

Given the specs of the computer running the search, this doesn't seem
terrible. But, when running a search, we notice that Elasticsearch is
using just a few % CPU time, and less than 150 MB of RAM, even though
much more is available.

That behavior makes me think the query latency is mainly time to read
from the hard disk. But, I'm curious why ES isn't trying to use more
RAM to make the query faster.

I'm wondering if this all sounds normal, and whether there's anything
we can do to optimize this particular type of search. We changed the
index mapping to store the name, description, and score. In this case,
we don't care about the total number of matches found, if that makes a
difference.

Thanks,

Chris

--

Paul Loy
paul@keteracel.com
http://uk.linkedin.com/in/paulloy

The index size was about 3 gigs.

Hmmh, that should not be too much for the server...

Some questions :wink:

Did you verified that ES is really using that 2gig? Are you starting
ES via the elasticsearch scipt, then there would be a variable for
this. How many shards do you have? Have you changed any other settings
or using different index settings? Is it a simple term query? Which
java version are you using and did you use "-server"?

Peter.

We are starting elasticsearch via the shell script, and setting memory
variables (ES_MIN_MEM, ES_MAX_MEM).
Shards: Whatever the default is (1, I think)
Index: We haven't changed any other settings. We ran the tests again
today with the automatically created index, and saw similar
performance.
Query:
{
fields: ["goodness"],
sort: [{goodness: {order: "desc"}}],
query: {
text: {
description: {
query: "randomWord1 randomWord2",
operator: "and"
}
}
},
size: 50
}

Java version: 1.6.0_29. We did not use "-server"

As far as we can tell, ES is not using all the memory allocated. On
the same machine we tested on yesterday, it gets up to a max of about
200-250 MB. Watching the disk I/O, it's only reading about 700 KB/s
from the HD while the queries are running. We tested on a different
machine running Ubuntu, and it got up to about 350 MB. We verified
that it was respecting the memory settings, because when it builds the
index it used up all the RAM allocated.

We built a test script that runs searches repeatedly (100 - 1000 at a
time). Running 100 searches (in parallel) (using the query above)
takes between 7-20 seconds, depending on how common the words are we
search with. (As expected, the search runs more slowly at first before
the cache is primed. The previous numbers are where it levels out to
after a few runs)

Regarding the amount of memory, elasticsearch (and Lucene) will use the
memory it needs, not more if there is more available. At least when search
is executed. Hard to tell with the machine you have if its really slow or
not, if you want to zip your data directory and dropbox it, I can run the
query and check here.

On Tue, Dec 13, 2011 at 9:48 PM, Chris Scribner scriby@gmail.com wrote:

We are starting elasticsearch via the shell script, and setting memory
variables (ES_MIN_MEM, ES_MAX_MEM).
Shards: Whatever the default is (1, I think)
Index: We haven't changed any other settings. We ran the tests again
today with the automatically created index, and saw similar
performance.
Query:
{
fields: ["goodness"],
sort: [{goodness: {order: "desc"}}],
query: {
text: {
description: {
query: "randomWord1 randomWord2",
operator: "and"
}
}
},
size: 50
}

Java version: 1.6.0_29. We did not use "-server"

As far as we can tell, ES is not using all the memory allocated. On
the same machine we tested on yesterday, it gets up to a max of about
200-250 MB. Watching the disk I/O, it's only reading about 700 KB/s
from the HD while the queries are running. We tested on a different
machine running Ubuntu, and it got up to about 350 MB. We verified
that it was respecting the memory settings, because when it builds the
index it used up all the RAM allocated.

We built a test script that runs searches repeatedly (100 - 1000 at a
time). Running 100 searches (in parallel) (using the query above)
takes between 7-20 seconds, depending on how common the words are we
search with. (As expected, the search runs more slowly at first before
the cache is primed. The previous numbers are where it levels out to
after a few runs)

Shay,

Thanks for the response. We've since done some testing on "production
grade" boxes and the performance is quite acceptable.

The guess we made from this data is that Lucene doesn't attempt to
cache the underlying documents in memory -- it just stores the index
in memory. Does that sound about right?

Thanks,

Chris

It loads part of the terms of the inverted index in memory. You can control
that part using the term index interval and divisor:
Elasticsearch Platform — Find real-time answers at scale | Elastic.

On Fri, Dec 16, 2011 at 7:01 PM, Chris Scribner scriby@gmail.com wrote:

Shay,

Thanks for the response. We've since done some testing on "production
grade" boxes and the performance is quite acceptable.

The guess we made from this data is that Lucene doesn't attempt to
cache the underlying documents in memory -- it just stores the index
in memory. Does that sound about right?

Thanks,

Chris