Yesterday, I started loading about 14M records into ElasticSearch
running on 3 small EC2 instances.
Yesterday, I have two machines with 10 threads each loading records. I
was getting a throughput of about 2.5M records per day. I only queued
up 1M records so when I came in this morning, it was done.
I queued up another 500k records this morning, when I checked this
afternoon, the throughput dropped to 250k per day. Based on my
timings, it was previously taking 250ms to 350ms for ElasticSearch to
take in the record. Now it is taking 3500ms.
I'm not sure what is going on.
So I have a few questions ...
Besides the REST API docs, is there any other documentation about
how ES works behind the scenes and how shards, node, and replication
is set up?
How would you recommend that I debug this issue?
How can I accidentally make my index go away? Since I already have
1.1M records indexed, I don't want to do something wrong to make all
that work disappear.
I would highly recommend that you do not use small instances. You've
probably got yourself a 'noisy neighbour'. You should use the larger
instance types to avoid this.
Yesterday, I started loading about 14M records into Elasticsearch
running on 3 small EC2 instances.
Yesterday, I have two machines with 10 threads each loading records. I
was getting a throughput of about 2.5M records per day. I only queued
up 1M records so when I came in this morning, it was done.
I queued up another 500k records this morning, when I checked this
afternoon, the throughput dropped to 250k per day. Based on my
timings, it was previously taking 250ms to 350ms for Elasticsearch to
take in the record. Now it is taking 3500ms.
I'm not sure what is going on.
So I have a few questions ...
Besides the REST API docs, is there any other documentation about
how ES works behind the scenes and how shards, node, and replication
is set up?
How would you recommend that I debug this issue?
How can I accidentally make my index go away? Since I already have
1.1M records indexed, I don't want to do something wrong to make all
that work disappear.
I would highly recommend that you do not use small instances. You've
probably got yourself a 'noisy neighbour'. You should use the larger
instance types to avoid this.
Yesterday, I started loading about 14M records into Elasticsearch
running on 3 small EC2 instances.
Yesterday, I have two machines with 10 threads each loading records. I
was getting a throughput of about 2.5M records per day. I only queued
up 1M records so when I came in this morning, it was done.
I queued up another 500k records this morning, when I checked this
afternoon, the throughput dropped to 250k per day. Based on my
timings, it was previously taking 250ms to 350ms for Elasticsearch to
take in the record. Now it is taking 3500ms.
I'm not sure what is going on.
So I have a few questions ...
Besides the REST API docs, is there any other documentation about
how ES works behind the scenes and how shards, node, and replication
is set up?
How would you recommend that I debug this issue?
How can I accidentally make my index go away? Since I already have
1.1M records indexed, I don't want to do something wrong to make all
that work disappear.
I would highly recommend that you do not use small instances. You've
probably got yourself a 'noisy neighbour'. You should use the larger
instance types to avoid this.
Yesterday, I started loading about 14M records into Elasticsearch
running on 3 small EC2 instances.
Yesterday, I have two machines with 10 threads each loading records. I
was getting a throughput of about 2.5M records per day. I only queued
up 1M records so when I came in this morning, it was done.
I queued up another 500k records this morning, when I checked this
afternoon, the throughput dropped to 250k per day. Based on my
timings, it was previously taking 250ms to 350ms for Elasticsearch to
take in the record. Now it is taking 3500ms.
I'm not sure what is going on.
So I have a few questions ...
Besides the REST API docs, is there any other documentation about
how ES works behind the scenes and how shards, node, and replication
is set up?
How would you recommend that I debug this issue?
How can I accidentally make my index go away? Since I already have
1.1M records indexed, I don't want to do something wrong to make all
that work disappear.
Some notes on the cloud gateway, it basically provides long term persistency
using s3. A word of caution, its a bit shaky in 0.8, I am working on fixing
it for 0.9 (actually, thats the last issue remaining for 0.9).
The main reason why this slowdown might happen (putting aside amazon quirks)
is some sort of a leak in elasticsearch (usually memory). 0.9 is much much
better compared to 0.8, though you should not see it with such small scale
test, so it leads me back to amazon... .
Few more questions:
How many nodes are you running?
How do you index the data? I assume HTTP, do you make sure you use keep
alive with it?
I would highly recommend that you do not use small instances. You've
probably got yourself a 'noisy neighbour'. You should use the larger
instance types to avoid this.
Yesterday, I started loading about 14M records into Elasticsearch
running on 3 small EC2 instances.
Yesterday, I have two machines with 10 threads each loading records. I
was getting a throughput of about 2.5M records per day. I only queued
up 1M records so when I came in this morning, it was done.
I queued up another 500k records this morning, when I checked this
afternoon, the throughput dropped to 250k per day. Based on my
timings, it was previously taking 250ms to 350ms for Elasticsearch to
take in the record. Now it is taking 3500ms.
I'm not sure what is going on.
So I have a few questions ...
Besides the REST API docs, is there any other documentation about
how ES works behind the scenes and how shards, node, and replication
is set up?
How would you recommend that I debug this issue?
How can I accidentally make my index go away? Since I already have
1.1M records indexed, I don't want to do something wrong to make all
that work disappear.
I'm indexing the data with the REST API over HTTP; I'm not using
keep alive. I'm usng the Java Jersey library so I'll see if there is a
keep alive setting.
Some notes on the cloud gateway, it basically provides long term persistency
using s3. A word of caution, its a bit shaky in 0.8, I am working on fixing
it for 0.9 (actually, thats the last issue remaining for 0.9).
The main reason why this slowdown might happen (putting aside amazon quirks)
is some sort of a leak in elasticsearch (usually memory). 0.9 is much much
better compared to 0.8, though you should not see it with such small scale
test, so it leads me back to amazon... .
Few more questions:
How many nodes are you running?
How do you index the data? I assume HTTP, do you make sure you use keep
alive with it?
I would highly recommend that you do not use small instances. You've
probably got yourself a 'noisy neighbour'. You should use the larger
instance types to avoid this.
Yesterday, I started loading about 14M records into Elasticsearch
running on 3 small EC2 instances.
Yesterday, I have two machines with 10 threads each loading records. I
was getting a throughput of about 2.5M records per day. I only queued
up 1M records so when I came in this morning, it was done.
I queued up another 500k records this morning, when I checked this
afternoon, the throughput dropped to 250k per day. Based on my
timings, it was previously taking 250ms to 350ms for Elasticsearch to
take in the record. Now it is taking 3500ms.
I'm not sure what is going on.
So I have a few questions ...
Besides the REST API docs, is there any other documentation about
how ES works behind the scenes and how shards, node, and replication
is set up?
How would you recommend that I debug this issue?
How can I accidentally make my index go away? Since I already have
1.1M records indexed, I don't want to do something wrong to make all
that work disappear.
I'm indexing the data with the REST API over HTTP; I'm not using
keep alive. I'm usng the Java Jersey library so I'll see if there is a
keep alive setting.
Some notes on the cloud gateway, it basically provides long term
persistency
using s3. A word of caution, its a bit shaky in 0.8, I am working on
fixing
it for 0.9 (actually, thats the last issue remaining for 0.9).
The main reason why this slowdown might happen (putting aside amazon
quirks)
is some sort of a leak in elasticsearch (usually memory). 0.9 is much
much
better compared to 0.8, though you should not see it with such small
scale
test, so it leads me back to amazon... .
Few more questions:
How many nodes are you running?
How do you index the data? I assume HTTP, do you make sure you use
keep
alive with it?
I would highly recommend that you do not use small instances.
You've
probably got yourself a 'noisy neighbour'. You should use the
larger
instance types to avoid this.
Yesterday, I started loading about 14M records into Elasticsearch
running on 3 small EC2 instances.
Yesterday, I have two machines with 10 threads each loading
records. I
was getting a throughput of about 2.5M records per day. I only
queued
up 1M records so when I came in this morning, it was done.
I queued up another 500k records this morning, when I checked this
afternoon, the throughput dropped to 250k per day. Based on my
timings, it was previously taking 250ms to 350ms for Elasticsearch
to
take in the record. Now it is taking 3500ms.
I'm not sure what is going on.
So I have a few questions ...
Besides the REST API docs, is there any other documentation
about
how ES works behind the scenes and how shards, node, and
replication
is set up?
How would you recommend that I debug this issue?
How can I accidentally make my index go away? Since I already
have
1.1M records indexed, I don't want to do something wrong to make
all
that work disappear.
I'm indexing the data with the REST API over HTTP; I'm not using
keep alive. I'm usng the Java Jersey library so I'll see if there is a
keep alive setting.
Some notes on the cloud gateway, it basically provides long term
persistency
using s3. A word of caution, its a bit shaky in 0.8, I am working on
fixing
it for 0.9 (actually, thats the last issue remaining for 0.9).
The main reason why this slowdown might happen (putting aside amazon
quirks)
is some sort of a leak in elasticsearch (usually memory). 0.9 is much
much
better compared to 0.8, though you should not see it with such small
scale
test, so it leads me back to amazon... .
Few more questions:
How many nodes are you running?
How do you index the data? I assume HTTP, do you make sure you use
keep
alive with it?
I would highly recommend that you do not use small instances.
You've
probably got yourself a 'noisy neighbour'. You should use the
larger
instance types to avoid this.
Yesterday, I started loading about 14M records into Elasticsearch
running on 3 small EC2 instances.
Yesterday, I have two machines with 10 threads each loading
records. I
was getting a throughput of about 2.5M records per day. I only
queued
up 1M records so when I came in this morning, it was done.
I queued up another 500k records this morning, when I checked this
afternoon, the throughput dropped to 250k per day. Based on my
timings, it was previously taking 250ms to 350ms for Elasticsearch
to
take in the record. Now it is taking 3500ms.
I'm not sure what is going on.
So I have a few questions ...
Besides the REST API docs, is there any other documentation
about
how ES works behind the scenes and how shards, node, and
replication
is set up?
How would you recommend that I debug this issue?
How can I accidentally make my index go away? Since I already
have
1.1M records indexed, I don't want to do something wrong to make
all
that work disappear.
Nevermind, (insert foot into mouth), I found the docs I needed to
answer my question. For the next version of my prototype, I'll try the
Java API. Thanks.
Still, an online Javadoc would still be nice for the lazy.
I'm indexing the data with the REST API over HTTP; I'm not using
keep alive. I'm usng the Java Jersey library so I'll see if there is a
keep alive setting.
Some notes on the cloud gateway, it basically provides long term
persistency
using s3. A word of caution, its a bit shaky in 0.8, I am working on
fixing
it for 0.9 (actually, thats the last issue remaining for 0.9).
The main reason why this slowdown might happen (putting aside amazon
quirks)
is some sort of a leak in elasticsearch (usually memory). 0.9 is much
much
better compared to 0.8, though you should not see it with such small
scale
test, so it leads me back to amazon... .
Few more questions:
How many nodes are you running?
How do you index the data? I assume HTTP, do you make sure you use
keep
alive with it?
I would highly recommend that you do not use small instances.
You've
probably got yourself a 'noisy neighbour'. You should use the
larger
instance types to avoid this.
Yesterday, I started loading about 14M records into Elasticsearch
running on 3 small EC2 instances.
Yesterday, I have two machines with 10 threads each loading
records. I
was getting a throughput of about 2.5M records per day. I only
queued
up 1M records so when I came in this morning, it was done.
I queued up another 500k records this morning, when I checked this
afternoon, the throughput dropped to 250k per day. Based on my
timings, it was previously taking 250ms to 350ms for Elasticsearch
to
take in the record. Now it is taking 3500ms.
I'm not sure what is going on.
So I have a few questions ...
Besides the REST API docs, is there any other documentation
about
how ES works behind the scenes and how shards, node, and
replication
is set up?
How would you recommend that I debug this issue?
How can I accidentally make my index go away? Since I already
have
1.1M records indexed, I don't want to do something wrong to make
all
that work disappear.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.