ElasticSearch - Memory and Query Performance

Hi,

I am evaluating ElasticSearch (ES) for a project and it comes close to the
requirements we are looking at. Since I am new to ElasticSearch, I am
trying to figure out a couple of things and have multiple questions.

For comparing ES and a few other data stores, I used YCSBhttps://github.com/brianfrankcooper/YCSB to
run some benchmarks of different kinds of scenarios we foresee right now.
Our application is write-heavy (relatively) and ES seems to meet our
expectations as far as writes are concerned. The test is was performed on a
1 node cluster only.

Questions:

  1. I inserted 1 million records in an index with 5 shards. Somehow, I
    didn't see any spike in memory usage. I am not sure if that is something I
    should expect.
  2. Apart from JVM, what else does ES majorly need memory for? For
    example, I know that increasing the number of shards per index comes with
    the cost of more memory and I tried this myself. I am asking this question
    because I want to understand under what circumstance, memory will be a
    bottleneck.
  3. Does ES keep indices partially or fully in the memory for any
    purposes like speeding up search queries unless the index is in-memory?
  4. Even though READS were extremely fast, SEARCHES were very poor. One
    query was taking more than 500 ms. This was troubling me. I thought ES can
    easily handle 1 million records and search through them. Is this normal?
    Can I improve this on just one node or adding more nodes with replicas will
    help?
  5. What should I use for benchmarking? I am not sure if YCSB the right
    tool to benchmark ES and does not offer much flexibility.
  6. How should I plan resources in terms of memory and CPU.

Thanks in advance!

  • Vaidik

--

Hi,

In ES features like sorting and faceting require the most memory.
These features use the field data cache which sits inside the jvm heap
space. These features aren't used during indexing. ES doesn't load the
index into the jvm heap space. ES does use the OS filesystem cache
during searching, so in many cases large parts of the index are in
memory (this is the reason why you shouldn't allocate all your
available memory to the ES jvm process).

Can you share your search request, or elaborate what type of queries,
facets and features you're using in your search requests? In general
adding more nodes (and optionally increasing number of replicas) can
improve query performance. Unfortunately I'm not familiar with YCSB.

The amount of memory you need depend on the type of search requests
and index size (#docs and #terms etc.), so I it is difficult to tell.
Usually a single node has multiple indices, each index has multiple
shards and each shard has one or more copies. A search request is
usually under the hood translated into several shard search requests.
So a node is highly concurrent and having a decent number of cpu's /
cpu cores per node make sense. I wouldn't invest to much in super
powerful nodes (machines), just add more nodes when you need to
increase search request throughput or improve search performance.

Martijn

On 1 January 2013 14:27, Vaidik Kapoor kapoor.vaidik@gmail.com wrote:

Hi,

I am evaluating Elasticsearch (ES) for a project and it comes close to the
requirements we are looking at. Since I am new to Elasticsearch, I am trying
to figure out a couple of things and have multiple questions.

For comparing ES and a few other data stores, I used YCSB to run some
benchmarks of different kinds of scenarios we foresee right now. Our
application is write-heavy (relatively) and ES seems to meet our
expectations as far as writes are concerned. The test is was performed on a
1 node cluster only.

Questions:

I inserted 1 million records in an index with 5 shards. Somehow, I didn't
see any spike in memory usage. I am not sure if that is something I should
expect.
Apart from JVM, what else does ES majorly need memory for? For example, I
know that increasing the number of shards per index comes with the cost of
more memory and I tried this myself. I am asking this question because I
want to understand under what circumstance, memory will be a bottleneck.
Does ES keep indices partially or fully in the memory for any purposes like
speeding up search queries unless the index is in-memory?
Even though READS were extremely fast, SEARCHES were very poor. One query
was taking more than 500 ms. This was troubling me. I thought ES can easily
handle 1 million records and search through them. Is this normal? Can I
improve this on just one node or adding more nodes with replicas will help?
What should I use for benchmarking? I am not sure if YCSB the right tool to
benchmark ES and does not offer much flexibility.
How should I plan resources in terms of memory and CPU.

Thanks in advance!

  • Vaidik

--

--
Met vriendelijke groet,

Martijn van Groningen

--

Hello,

On Tuesday, January 1, 2013 8:27:49 AM UTC-5, Vaidik Kapoor wrote:

Hi,

I am evaluating Elasticsearch (ES) for a project and it comes close to the
requirements we are looking at. Since I am new to Elasticsearch, I am
trying to figure out a couple of things and have multiple questions.

For comparing ES and a few other data stores, I used YCSBhttps://github.com/brianfrankcooper/YCSB to
run some benchmarks of different kinds of scenarios we foresee right now.
Our application is write-heavy (relatively) and ES seems to meet our
expectations as far as writes are concerned. The test is was performed on a
1 node cluster only.

Questions:

  1. I inserted 1 million records in an index with 5 shards. Somehow, I
    didn't see any spike in memory usage. I am not sure if that is something I
    should expect.

Indexing is not memory heavy. Search can be, if it involves sorting and
faceting.

  1. Apart from JVM, what else does ES majorly need memory for? For
    example, I know that increasing the number of shards per index comes with
    the cost of more memory and I tried this myself. I am asking this question
    because I want to understand under what circumstance, memory will be a
    bottleneck.

Each open index has some memory cost. OS wants to cache the index as much
as possible (that's outside the JVM and ES).

  1. Does ES keep indices partially or fully in the memory for any
    purposes like speeding up search queries unless the index is in-memory?

Not explicitly. This is somewhat in your control - if you leave more RAM
to the OS it will be able to cache more of the index.

  1. Even though READS were extremely fast, SEARCHES were very poor. One
    query was taking more than 500 ms. This was troubling me. I thought ES can
    easily handle 1 million records and search through them. Is this normal?
    Can I improve this on just one node or adding more nodes with replicas will
    help?

If the query was expensive, if it matched lots of docs, if it hit the part
of the index that was not cached... it will take longer than if
Run the same query a few times and you should see latency go down due to
caching.

  1. What should I use for benchmarking? I am not sure if YCSB the right
    tool to benchmark ES and does not offer much flexibility.

Apache JMeter will do.

  1. How should I plan resources in terms of memory and CPU.

That's a very open question that nobody can precisely answer unfortunately.

Otis

Solr & Elasticsearch Support

Thanks in advance!

  • Vaidik

--

Thanks for your reply. :slight_smile: It was insightful. I have spent some time
watching videos listed on Elasticsearch's website. Particularly the one by
Shay and one by a probably a colleague of yours at Berlin Buzzwords were
very helpful.

I have been able to optimize on query times with whatever basic knowledge
of ES I have. Having successfully done that and knowing that organizations
use ES successfully, I am confident that with time I will be able to
improve query performance.

Thanks for all the help! :slight_smile:

On Thursday, January 3, 2013 10:18:20 AM UTC+5:30, Otis Gospodnetic wrote:

Hello,

On Tuesday, January 1, 2013 8:27:49 AM UTC-5, Vaidik Kapoor wrote:

Hi,

I am evaluating Elasticsearch (ES) for a project and it comes close to
the requirements we are looking at. Since I am new to Elasticsearch, I am
trying to figure out a couple of things and have multiple questions.

For comparing ES and a few other data stores, I used YCSBhttps://github.com/brianfrankcooper/YCSB to
run some benchmarks of different kinds of scenarios we foresee right now.
Our application is write-heavy (relatively) and ES seems to meet our
expectations as far as writes are concerned. The test is was performed on a
1 node cluster only.

Questions:

  1. I inserted 1 million records in an index with 5 shards. Somehow, I
    didn't see any spike in memory usage. I am not sure if that is something I
    should expect.

Indexing is not memory heavy. Search can be, if it involves sorting and
faceting.

  1. Apart from JVM, what else does ES majorly need memory for? For
    example, I know that increasing the number of shards per index comes with
    the cost of more memory and I tried this myself. I am asking this question
    because I want to understand under what circumstance, memory will be a
    bottleneck.

Each open index has some memory cost. OS wants to cache the index as much
as possible (that's outside the JVM and ES).

  1. Does ES keep indices partially or fully in the memory for any
    purposes like speeding up search queries unless the index is in-memory?

Not explicitly. This is somewhat in your control - if you leave more RAM
to the OS it will be able to cache more of the index.

  1. Even though READS were extremely fast, SEARCHES were very poor.
    One query was taking more than 500 ms. This was troubling me. I thought ES
    can easily handle 1 million records and search through them. Is this
    normal? Can I improve this on just one node or adding more nodes with
    replicas will help?

If the query was expensive, if it matched lots of docs, if it hit the part
of the index that was not cached... it will take longer than if
Run the same query a few times and you should see latency go down due to
caching.

  1. What should I use for benchmarking? I am not sure if YCSB the
    right tool to benchmark ES and does not offer much flexibility.

Apache JMeter will do.

  1. How should I plan resources in terms of memory and CPU.

That's a very open question that nobody can precisely answer
unfortunately.

Otis

Solr & Elasticsearch Support
http://sematext.com/

Thanks in advance!

  • Vaidik

--