Suddenly slow on EC2

Yesterday, I started loading about 14M records into ElasticSearch
running on 3 small EC2 instances.

Yesterday, I have two machines with 10 threads each loading records. I
was getting a throughput of about 2.5M records per day. I only queued
up 1M records so when I came in this morning, it was done.

I queued up another 500k records this morning, when I checked this
afternoon, the throughput dropped to 250k per day. Based on my
timings, it was previously taking 250ms to 350ms for ElasticSearch to
take in the record. Now it is taking 3500ms.

I'm not sure what is going on.

So I have a few questions ...

  1. Besides the REST API docs, is there any other documentation about
    how ES works behind the scenes and how shards, node, and replication
    is set up?
  2. How would you recommend that I debug this issue?
  3. How can I accidentally make my index go away? Since I already have
    1.1M records indexed, I don't want to do something wrong to make all
    that work disappear.

I would highly recommend that you do not use small instances. You've
probably got yourself a 'noisy neighbour'. You should use the larger
instance types to avoid this.

On Thu, Jul 15, 2010 at 10:55 PM, David Jensen djensen47@gmail.com wrote:

Yesterday, I started loading about 14M records into Elasticsearch
running on 3 small EC2 instances.

Yesterday, I have two machines with 10 threads each loading records. I
was getting a throughput of about 2.5M records per day. I only queued
up 1M records so when I came in this morning, it was done.

I queued up another 500k records this morning, when I checked this
afternoon, the throughput dropped to 250k per day. Based on my
timings, it was previously taking 250ms to 350ms for Elasticsearch to
take in the record. Now it is taking 3500ms.

I'm not sure what is going on.

So I have a few questions ...

  1. Besides the REST API docs, is there any other documentation about
    how ES works behind the scenes and how shards, node, and replication
    is set up?
  2. How would you recommend that I debug this issue?
  3. How can I accidentally make my index go away? Since I already have
    1.1M records indexed, I don't want to do something wrong to make all
    that work disappear.

--

Paul Loy
paul@keteracel.com
http://www.keteracel.com/paul

Hi,

There are many reasons why this might happen, one of them if what Paul
suggested. Let me ask a few more questions:

  1. Which version are you using?
  2. How many indices do you create?
  3. Do you use the cloud gateway?

If you know your way around the JVM, then monitoring the JVM using
visualvm for example for memory usage or GC activity might be a good start.

-shay.banon

On Fri, Jul 16, 2010 at 1:12 AM, Paul Loy keteracel@gmail.com wrote:

I would highly recommend that you do not use small instances. You've
probably got yourself a 'noisy neighbour'. You should use the larger
instance types to avoid this.

On Thu, Jul 15, 2010 at 10:55 PM, David Jensen djensen47@gmail.comwrote:

Yesterday, I started loading about 14M records into Elasticsearch
running on 3 small EC2 instances.

Yesterday, I have two machines with 10 threads each loading records. I
was getting a throughput of about 2.5M records per day. I only queued
up 1M records so when I came in this morning, it was done.

I queued up another 500k records this morning, when I checked this
afternoon, the throughput dropped to 250k per day. Based on my
timings, it was previously taking 250ms to 350ms for Elasticsearch to
take in the record. Now it is taking 3500ms.

I'm not sure what is going on.

So I have a few questions ...

  1. Besides the REST API docs, is there any other documentation about
    how ES works behind the scenes and how shards, node, and replication
    is set up?
  2. How would you recommend that I debug this issue?
  3. How can I accidentally make my index go away? Since I already have
    1.1M records indexed, I don't want to do something wrong to make all
    that work disappear.

--

Paul Loy
paul@keteracel.com
http://www.keteracel.com/paul

Shay,

Here are the answers to your questions:

  1. 0.8.0
  2. I have one index and one type
  3. No cloud gateway ... I'll read the docs on that right now

Thanks,
David

On Jul 15, 3:37 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

There are many reasons why this might happen, one of them if what Paul
suggested. Let me ask a few more questions:

  1. Which version are you using?
  2. How many indices do you create?
  3. Do you use the cloud gateway?

If you know your way around the JVM, then monitoring the JVM using
visualvm for example for memory usage or GC activity might be a good start.

-shay.banon

On Fri, Jul 16, 2010 at 1:12 AM, Paul Loy ketera...@gmail.com wrote:

I would highly recommend that you do not use small instances. You've
probably got yourself a 'noisy neighbour'. You should use the larger
instance types to avoid this.

On Thu, Jul 15, 2010 at 10:55 PM, David Jensen djense...@gmail.comwrote:

Yesterday, I started loading about 14M records into Elasticsearch
running on 3 small EC2 instances.

Yesterday, I have two machines with 10 threads each loading records. I
was getting a throughput of about 2.5M records per day. I only queued
up 1M records so when I came in this morning, it was done.

I queued up another 500k records this morning, when I checked this
afternoon, the throughput dropped to 250k per day. Based on my
timings, it was previously taking 250ms to 350ms for Elasticsearch to
take in the record. Now it is taking 3500ms.

I'm not sure what is going on.

So I have a few questions ...

  1. Besides the REST API docs, is there any other documentation about
    how ES works behind the scenes and how shards, node, and replication
    is set up?
  2. How would you recommend that I debug this issue?
  3. How can I accidentally make my index go away? Since I already have
    1.1M records indexed, I don't want to do something wrong to make all
    that work disappear.

--

Paul Loy
p...@keteracel.com
http://www.keteracel.com/paul

Some notes on the cloud gateway, it basically provides long term persistency
using s3. A word of caution, its a bit shaky in 0.8, I am working on fixing
it for 0.9 (actually, thats the last issue remaining for 0.9).

The main reason why this slowdown might happen (putting aside amazon quirks)
is some sort of a leak in elasticsearch (usually memory). 0.9 is much much
better compared to 0.8, though you should not see it with such small scale
test, so it leads me back to amazon... .

Few more questions:

  1. How many nodes are you running?
  2. How do you index the data? I assume HTTP, do you make sure you use keep
    alive with it?

-shay.banon

On Fri, Jul 16, 2010 at 1:47 AM, David Jensen djensen47@gmail.com wrote:

Shay,

Here are the answers to your questions:

  1. 0.8.0
  2. I have one index and one type
  3. No cloud gateway ... I'll read the docs on that right now

Thanks,
David

On Jul 15, 3:37 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

There are many reasons why this might happen, one of them if what Paul
suggested. Let me ask a few more questions:

  1. Which version are you using?
  2. How many indices do you create?
  3. Do you use the cloud gateway?

If you know your way around the JVM, then monitoring the JVM using
visualvm for example for memory usage or GC activity might be a good
start.

-shay.banon

On Fri, Jul 16, 2010 at 1:12 AM, Paul Loy ketera...@gmail.com wrote:

I would highly recommend that you do not use small instances. You've
probably got yourself a 'noisy neighbour'. You should use the larger
instance types to avoid this.

On Thu, Jul 15, 2010 at 10:55 PM, David Jensen <djense...@gmail.com
wrote:

Yesterday, I started loading about 14M records into Elasticsearch
running on 3 small EC2 instances.

Yesterday, I have two machines with 10 threads each loading records. I
was getting a throughput of about 2.5M records per day. I only queued
up 1M records so when I came in this morning, it was done.

I queued up another 500k records this morning, when I checked this
afternoon, the throughput dropped to 250k per day. Based on my
timings, it was previously taking 250ms to 350ms for Elasticsearch to
take in the record. Now it is taking 3500ms.

I'm not sure what is going on.

So I have a few questions ...

  1. Besides the REST API docs, is there any other documentation about
    how ES works behind the scenes and how shards, node, and replication
    is set up?
  2. How would you recommend that I debug this issue?
  3. How can I accidentally make my index go away? Since I already have
    1.1M records indexed, I don't want to do something wrong to make all
    that work disappear.

--

Paul Loy
p...@keteracel.com
http://www.keteracel.com/paul

  1. I'm running 3 nodes
  2. I'm indexing the data with the REST API over HTTP; I'm not using
    keep alive. I'm usng the Java Jersey library so I'll see if there is a
    keep alive setting.

On Jul 15, 3:53 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Some notes on the cloud gateway, it basically provides long term persistency
using s3. A word of caution, its a bit shaky in 0.8, I am working on fixing
it for 0.9 (actually, thats the last issue remaining for 0.9).

The main reason why this slowdown might happen (putting aside amazon quirks)
is some sort of a leak in elasticsearch (usually memory). 0.9 is much much
better compared to 0.8, though you should not see it with such small scale
test, so it leads me back to amazon... .

Few more questions:

  1. How many nodes are you running?
  2. How do you index the data? I assume HTTP, do you make sure you use keep
    alive with it?

-shay.banon

On Fri, Jul 16, 2010 at 1:47 AM, David Jensen djense...@gmail.com wrote:

Shay,

Here are the answers to your questions:

  1. 0.8.0
  2. I have one index and one type
  3. No cloud gateway ... I'll read the docs on that right now

Thanks,
David

On Jul 15, 3:37 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

There are many reasons why this might happen, one of them if what Paul
suggested. Let me ask a few more questions:

  1. Which version are you using?
  2. How many indices do you create?
  3. Do you use the cloud gateway?

If you know your way around the JVM, then monitoring the JVM using
visualvm for example for memory usage or GC activity might be a good
start.

-shay.banon

On Fri, Jul 16, 2010 at 1:12 AM, Paul Loy ketera...@gmail.com wrote:

I would highly recommend that you do not use small instances. You've
probably got yourself a 'noisy neighbour'. You should use the larger
instance types to avoid this.

On Thu, Jul 15, 2010 at 10:55 PM, David Jensen <djense...@gmail.com
wrote:

Yesterday, I started loading about 14M records into Elasticsearch
running on 3 small EC2 instances.

Yesterday, I have two machines with 10 threads each loading records. I
was getting a throughput of about 2.5M records per day. I only queued
up 1M records so when I came in this morning, it was done.

I queued up another 500k records this morning, when I checked this
afternoon, the throughput dropped to 250k per day. Based on my
timings, it was previously taking 250ms to 350ms for Elasticsearch to
take in the record. Now it is taking 3500ms.

I'm not sure what is going on.

So I have a few questions ...

  1. Besides the REST API docs, is there any other documentation about
    how ES works behind the scenes and how shards, node, and replication
    is set up?
  2. How would you recommend that I debug this issue?
  3. How can I accidentally make my index go away? Since I already have
    1.1M records indexed, I don't want to do something wrong to make all
    that work disappear.

--

Paul Loy
p...@keteracel.com
http://www.keteracel.com/paul

If you are using Java, why not use the Java API directly? You get static
typing, discovery, and better performance than pure HTTP.

On Fri, Jul 16, 2010 at 2:01 AM, David Jensen djensen47@gmail.com wrote:

  1. I'm running 3 nodes
  2. I'm indexing the data with the REST API over HTTP; I'm not using
    keep alive. I'm usng the Java Jersey library so I'll see if there is a
    keep alive setting.

On Jul 15, 3:53 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Some notes on the cloud gateway, it basically provides long term
persistency
using s3. A word of caution, its a bit shaky in 0.8, I am working on
fixing
it for 0.9 (actually, thats the last issue remaining for 0.9).

The main reason why this slowdown might happen (putting aside amazon
quirks)
is some sort of a leak in elasticsearch (usually memory). 0.9 is much
much
better compared to 0.8, though you should not see it with such small
scale
test, so it leads me back to amazon... .

Few more questions:

  1. How many nodes are you running?
  2. How do you index the data? I assume HTTP, do you make sure you use
    keep
    alive with it?

-shay.banon

On Fri, Jul 16, 2010 at 1:47 AM, David Jensen djense...@gmail.com
wrote:

Shay,

Here are the answers to your questions:

  1. 0.8.0
  2. I have one index and one type
  3. No cloud gateway ... I'll read the docs on that right now

Thanks,
David

On Jul 15, 3:37 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

There are many reasons why this might happen, one of them if what
Paul
suggested. Let me ask a few more questions:

  1. Which version are you using?
  2. How many indices do you create?
  3. Do you use the cloud gateway?

If you know your way around the JVM, then monitoring the JVM using
visualvm for example for memory usage or GC activity might be a good
start.

-shay.banon

On Fri, Jul 16, 2010 at 1:12 AM, Paul Loy ketera...@gmail.com
wrote:

I would highly recommend that you do not use small instances.
You've
probably got yourself a 'noisy neighbour'. You should use the
larger
instance types to avoid this.

On Thu, Jul 15, 2010 at 10:55 PM, David Jensen <
djense...@gmail.com
wrote:

Yesterday, I started loading about 14M records into Elasticsearch
running on 3 small EC2 instances.

Yesterday, I have two machines with 10 threads each loading
records. I
was getting a throughput of about 2.5M records per day. I only
queued
up 1M records so when I came in this morning, it was done.

I queued up another 500k records this morning, when I checked this
afternoon, the throughput dropped to 250k per day. Based on my
timings, it was previously taking 250ms to 350ms for Elasticsearch
to
take in the record. Now it is taking 3500ms.

I'm not sure what is going on.

So I have a few questions ...

  1. Besides the REST API docs, is there any other documentation
    about
    how ES works behind the scenes and how shards, node, and
    replication
    is set up?
  2. How would you recommend that I debug this issue?
  3. How can I accidentally make my index go away? Since I already
    have
    1.1M records indexed, I don't want to do something wrong to make
    all
    that work disappear.

--

Paul Loy
p...@keteracel.com
http://www.keteracel.com/paul

I felt the documentation for the Java API wasn't as clear at the REST
API documentation, which is very good.

Next, I couldn't find a Javadoc to answer my question below ... (I'm
sure I can checkout the source and generate my own, I know but I'm
lazy).

Finally, the code examples keep making mysterious method invocations:

import static org.elasticsearch.client.Requests.;
import static org.elasticsearch.util.xcontent.XContentBuilder.
;

IndexResponse response = client.index(indexRequest("twitter")
.type("tweet")
.id("1")
.source(jsonBuilder()
.startObject()
.field("user", "kimchy")
.field("postDate", new Date())
.field("message", "trying out Elastic Search")
.endObject()
)).actionGet();

Where does the "jasonBuilder()" method come from? Am I missing
something?

There's also the client variable, which was not defined in this
example but I imagine it is defined on the Client doc page.

On Jul 15, 4:05 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

If you are using Java, why not use the Java API directly? You get static
typing, discovery, and better performance than pure HTTP.

On Fri, Jul 16, 2010 at 2:01 AM, David Jensen djense...@gmail.com wrote:

  1. I'm running 3 nodes
  2. I'm indexing the data with the REST API over HTTP; I'm not using
    keep alive. I'm usng the Java Jersey library so I'll see if there is a
    keep alive setting.

On Jul 15, 3:53 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Some notes on the cloud gateway, it basically provides long term
persistency
using s3. A word of caution, its a bit shaky in 0.8, I am working on
fixing
it for 0.9 (actually, thats the last issue remaining for 0.9).

The main reason why this slowdown might happen (putting aside amazon
quirks)
is some sort of a leak in elasticsearch (usually memory). 0.9 is much
much
better compared to 0.8, though you should not see it with such small
scale
test, so it leads me back to amazon... .

Few more questions:

  1. How many nodes are you running?
  2. How do you index the data? I assume HTTP, do you make sure you use
    keep
    alive with it?

-shay.banon

On Fri, Jul 16, 2010 at 1:47 AM, David Jensen djense...@gmail.com
wrote:

Shay,

Here are the answers to your questions:

  1. 0.8.0
  2. I have one index and one type
  3. No cloud gateway ... I'll read the docs on that right now

Thanks,
David

On Jul 15, 3:37 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

There are many reasons why this might happen, one of them if what
Paul
suggested. Let me ask a few more questions:

  1. Which version are you using?
  2. How many indices do you create?
  3. Do you use the cloud gateway?

If you know your way around the JVM, then monitoring the JVM using
visualvm for example for memory usage or GC activity might be a good
start.

-shay.banon

On Fri, Jul 16, 2010 at 1:12 AM, Paul Loy ketera...@gmail.com
wrote:

I would highly recommend that you do not use small instances.
You've
probably got yourself a 'noisy neighbour'. You should use the
larger
instance types to avoid this.

On Thu, Jul 15, 2010 at 10:55 PM, David Jensen <
djense...@gmail.com
wrote:

Yesterday, I started loading about 14M records into Elasticsearch
running on 3 small EC2 instances.

Yesterday, I have two machines with 10 threads each loading
records. I
was getting a throughput of about 2.5M records per day. I only
queued
up 1M records so when I came in this morning, it was done.

I queued up another 500k records this morning, when I checked this
afternoon, the throughput dropped to 250k per day. Based on my
timings, it was previously taking 250ms to 350ms for Elasticsearch
to
take in the record. Now it is taking 3500ms.

I'm not sure what is going on.

So I have a few questions ...

  1. Besides the REST API docs, is there any other documentation
    about
    how ES works behind the scenes and how shards, node, and
    replication
    is set up?
  2. How would you recommend that I debug this issue?
  3. How can I accidentally make my index go away? Since I already
    have
    1.1M records indexed, I don't want to do something wrong to make
    all
    that work disappear.

--

Paul Loy
p...@keteracel.com
http://www.keteracel.com/paul

Nevermind, (insert foot into mouth), I found the docs I needed to
answer my question. For the next version of my prototype, I'll try the
Java API. Thanks.

Still, an online Javadoc would still be nice for the lazy. :wink:

On Jul 15, 4:39 pm, David Jensen djense...@gmail.com wrote:

I felt the documentation for the Java API wasn't as clear at the REST
API documentation, which is very good.

Next, I couldn't find a Javadoc to answer my question below ... (I'm
sure I can checkout the source and generate my own, I know but I'm
lazy).

Finally, the code examples keep making mysterious method invocations:

import static org.elasticsearch.client.Requests.;
import static org.elasticsearch.util.xcontent.XContentBuilder.
;

IndexResponse response = client.index(indexRequest("twitter")
.type("tweet")
.id("1")
.source(jsonBuilder()
.startObject()
.field("user", "kimchy")
.field("postDate", new Date())
.field("message", "trying out Elastic Search")
.endObject()
)).actionGet();

Where does the "jasonBuilder()" method come from? Am I missing
something?

There's also the client variable, which was not defined in this
example but I imagine it is defined on the Client doc page.

On Jul 15, 4:05 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

If you are using Java, why not use the Java API directly? You get static
typing, discovery, and better performance than pure HTTP.

On Fri, Jul 16, 2010 at 2:01 AM, David Jensen djense...@gmail.com wrote:

  1. I'm running 3 nodes
  2. I'm indexing the data with the REST API over HTTP; I'm not using
    keep alive. I'm usng the Java Jersey library so I'll see if there is a
    keep alive setting.

On Jul 15, 3:53 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Some notes on the cloud gateway, it basically provides long term
persistency
using s3. A word of caution, its a bit shaky in 0.8, I am working on
fixing
it for 0.9 (actually, thats the last issue remaining for 0.9).

The main reason why this slowdown might happen (putting aside amazon
quirks)
is some sort of a leak in elasticsearch (usually memory). 0.9 is much
much
better compared to 0.8, though you should not see it with such small
scale
test, so it leads me back to amazon... .

Few more questions:

  1. How many nodes are you running?
  2. How do you index the data? I assume HTTP, do you make sure you use
    keep
    alive with it?

-shay.banon

On Fri, Jul 16, 2010 at 1:47 AM, David Jensen djense...@gmail.com
wrote:

Shay,

Here are the answers to your questions:

  1. 0.8.0
  2. I have one index and one type
  3. No cloud gateway ... I'll read the docs on that right now

Thanks,
David

On Jul 15, 3:37 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

There are many reasons why this might happen, one of them if what
Paul
suggested. Let me ask a few more questions:

  1. Which version are you using?
  2. How many indices do you create?
  3. Do you use the cloud gateway?

If you know your way around the JVM, then monitoring the JVM using
visualvm for example for memory usage or GC activity might be a good
start.

-shay.banon

On Fri, Jul 16, 2010 at 1:12 AM, Paul Loy ketera...@gmail.com
wrote:

I would highly recommend that you do not use small instances.
You've
probably got yourself a 'noisy neighbour'. You should use the
larger
instance types to avoid this.

On Thu, Jul 15, 2010 at 10:55 PM, David Jensen <
djense...@gmail.com
wrote:

Yesterday, I started loading about 14M records into Elasticsearch
running on 3 small EC2 instances.

Yesterday, I have two machines with 10 threads each loading
records. I
was getting a throughput of about 2.5M records per day. I only
queued
up 1M records so when I came in this morning, it was done.

I queued up another 500k records this morning, when I checked this
afternoon, the throughput dropped to 250k per day. Based on my
timings, it was previously taking 250ms to 350ms for Elasticsearch
to
take in the record. Now it is taking 3500ms.

I'm not sure what is going on.

So I have a few questions ...

  1. Besides the REST API docs, is there any other documentation
    about
    how ES works behind the scenes and how shards, node, and
    replication
    is set up?
  2. How would you recommend that I debug this issue?
  3. How can I accidentally make my index go away? Since I already
    have
    1.1M records indexed, I don't want to do something wrong to make
    all
    that work disappear.

--

Paul Loy
p...@keteracel.com
http://www.keteracel.com/paul