Performance Limitation with ELK stack

Hi Guys,

We use ElasticSearch as our tracking system of our products in a dynamic to
track performance.
Searching of this data is being used by small group of users (10-12) in the
company to measure performance.

In our current environment, we see limit of 15,000 documents being inserted
per second without the ability to scale.

Some information on the current setup and the flow:

- Tracking servers
8 x Apache servers behind Amazon ELB which serve empty html files so it
tracks the parameters given and writes it to Apache access log.
on each server, we also have Logstash which configured to read this access
log file and send data to Elasticsearch cluster.

*- Elasticsearch Cluster: *
4 x r3.2xlarge (61.0 GB RAM, 8 cores) - contains one Elasticsearch process

  • 30GB Heap size
    1 x r3.4xlarge (122.0 GB RAM, 16 cores) - contains two Elasticsearch
    processes each with 30GB Heap size.

Additional information on the Cluster:

Cluster health:

Logstash Configuration: https://gist.github.com/hagait/23f4b2bc614a4c4acbb6

Elasticsearch
configuration: https://gist.github.com/hagait/ba3684048abe2f9219b8

Thank you for the support!
Regards,
Hagai

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9c5352ed-010d-4080-9794-b8dc0c2a5370%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

What is the question?...

15k inserts per sec per node is actually quite nice.

Are your index sharded? If you write to one index only, you write to
maximum of x nodes where x is the number of shards of that index. Since
shards of the same index can co-exist on one node, check if you are
spanning writes.

Use local disks - never EBS, and if you really care about writing speeds
use SSDs.

Other than that, Mike did an excellent write up on the subject :

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Lucene.NET committer and PMC member

On Thu, Feb 5, 2015 at 1:26 PM, Hagai T hagai.tob@gmail.com wrote:

Hi Guys,

We use Elasticsearch as our tracking system of our products in a dynamic
to track performance.
Searching of this data is being used by small group of users (10-12) in
the company to measure performance.

In our current environment, we see limit of 15,000 documents being
inserted per second without the ability to scale.

Some information on the current setup and the flow:

- Tracking servers
8 x Apache servers behind Amazon ELB which serve empty html files so it
tracks the parameters given and writes it to Apache access log.
on each server, we also have Logstash which configured to read this access
log file and send data to Elasticsearch cluster.

*- Elasticsearch Cluster: *
4 x r3.2xlarge (61.0 GB RAM, 8 cores) - contains one Elasticsearch
process - 30GB Heap size
1 x r3.4xlarge (122.0 GB RAM, 16 cores) - contains two Elasticsearch
processes each with 30GB Heap size.

Additional information on the Cluster:
Additional information · GitHub

Cluster health:
Cluster Health · GitHub

Logstash Configuration:
gist:23f4b2bc614a4c4acbb6 · GitHub

Elasticsearch configuration:
gist:ba3684048abe2f9219b8 · GitHub

Thank you for the support!
Regards,
Hagai

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9c5352ed-010d-4080-9794-b8dc0c2a5370%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9c5352ed-010d-4080-9794-b8dc0c2a5370%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4ZuuedagMgkbe4JoUGWg2DT1pFMkmdRKS3Rp4f4mwq6ntg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Itamar, thank you for the reply.

This is 15k inserts totally and not for one host in the cluster.
Yes, we have 15 sharding whiting one index. shards are spreaded on the
nodes equally (automatically by Elasticsearch cluster).
We currently use general purpose SSD and not Ephemeral storage.

In addition, I see a lot of thread pool bulk rejections from the
Elasticsearch side.

On Thursday, February 5, 2015 at 1:33:05 PM UTC+2, Itamar Syn-Hershko wrote:

What is the question?...

15k inserts per sec per node is actually quite nice.

Are your index sharded? If you write to one index only, you write to
maximum of x nodes where x is the number of shards of that index. Since
shards of the same index can co-exist on one node, check if you are
spanning writes.

Use local disks - never EBS, and if you really care about writing speeds
use SSDs.

Other than that, Mike did an excellent write up on the subject :
Elasticsearch Platform — Find real-time answers at scale | Elastic

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Lucene.NET committer and PMC member

On Thu, Feb 5, 2015 at 1:26 PM, Hagai T <haga...@gmail.com <javascript:>>
wrote:

Hi Guys,

We use Elasticsearch as our tracking system of our products in a dynamic
to track performance.
Searching of this data is being used by small group of users (10-12) in
the company to measure performance.

In our current environment, we see limit of 15,000 documents being
inserted per second without the ability to scale.

Some information on the current setup and the flow:

- Tracking servers
8 x Apache servers behind Amazon ELB which serve empty html files so it
tracks the parameters given and writes it to Apache access log.
on each server, we also have Logstash which configured to read this
access log file and send data to Elasticsearch cluster.

*- Elasticsearch Cluster: *
4 x r3.2xlarge (61.0 GB RAM, 8 cores) - contains one Elasticsearch
process - 30GB Heap size
1 x r3.4xlarge (122.0 GB RAM, 16 cores) - contains two Elasticsearch
processes each with 30GB Heap size.

Additional information on the Cluster:
Additional information · GitHub

Cluster health:
Cluster Health · GitHub

Logstash Configuration:
gist:23f4b2bc614a4c4acbb6 · GitHub

Elasticsearch configuration:
gist:ba3684048abe2f9219b8 · GitHub

Thank you for the support!
Regards,
Hagai

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9c5352ed-010d-4080-9794-b8dc0c2a5370%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9c5352ed-010d-4080-9794-b8dc0c2a5370%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/318b46ed-626f-4975-a417-c99ada9c30fe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I'd recommend you use ephemeral SSD - 2+ factor replicas and proper use of
the snapshot/restore API will provide you HA and DR guarantees.

The rejections you are seeing are due to slow I/O operations, because the
disk is not local. There is a way to have a bigger queue but I'd advise
against that and instead go with a local fast disk.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Lucene.NET committer and PMC member

On Thu, Feb 5, 2015 at 1:51 PM, Hagai T hagai.tob@gmail.com wrote:

Hi Itamar, thank you for the reply.

This is 15k inserts totally and not for one host in the cluster.
Yes, we have 15 sharding whiting one index. shards are spreaded on the
nodes equally (automatically by Elasticsearch cluster).
We currently use general purpose SSD and not Ephemeral storage.

In addition, I see a lot of thread pool bulk rejections from the
Elasticsearch side.

On Thursday, February 5, 2015 at 1:33:05 PM UTC+2, Itamar Syn-Hershko
wrote:

What is the question?...

15k inserts per sec per node is actually quite nice.

Are your index sharded? If you write to one index only, you write to
maximum of x nodes where x is the number of shards of that index. Since
shards of the same index can co-exist on one node, check if you are
spanning writes.

Use local disks - never EBS, and if you really care about writing speeds
use SSDs.

Other than that, Mike did an excellent write up on the subject :
Elasticsearch Platform — Find real-time answers at scale | Elastic
considerations-elasticsearch-indexing/

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Lucene.NET committer and PMC member

On Thu, Feb 5, 2015 at 1:26 PM, Hagai T haga...@gmail.com wrote:

Hi Guys,

We use Elasticsearch as our tracking system of our products in a dynamic
to track performance.
Searching of this data is being used by small group of users (10-12) in
the company to measure performance.

In our current environment, we see limit of 15,000 documents being
inserted per second without the ability to scale.

Some information on the current setup and the flow:

- Tracking servers
8 x Apache servers behind Amazon ELB which serve empty html files so it
tracks the parameters given and writes it to Apache access log.
on each server, we also have Logstash which configured to read this
access log file and send data to Elasticsearch cluster.

*- Elasticsearch Cluster: *
4 x r3.2xlarge (61.0 GB RAM, 8 cores) - contains one Elasticsearch
process - 30GB Heap size
1 x r3.4xlarge (122.0 GB RAM, 16 cores) - contains two Elasticsearch
processes each with 30GB Heap size.

Additional information on the Cluster:
Additional information · GitHub

Cluster health: Cluster Health · GitHub
file-gistfile1-txt

Logstash Configuration: https://gist.github.com/
hagait/23f4b2bc614a4c4acbb6

Elasticsearch configuration: hagait’s gists · GitHub
ba3684048abe2f9219b8

Thank you for the support!
Regards,
Hagai

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/9c5352ed-010d-4080-9794-b8dc0c2a5370%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9c5352ed-010d-4080-9794-b8dc0c2a5370%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/318b46ed-626f-4975-a417-c99ada9c30fe%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/318b46ed-626f-4975-a417-c99ada9c30fe%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zv0toyUEhc%2BynpkH%3DUxf9GVLuvnKdPSGUQcnG1grwpbCw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi,

We were able to identify the bottleneck which seems to be the Logstash
service.
It seems that the Elasticsearch cluster is able to handle 40,000 per second
documents with a 3 ES servers cluster using Java client that was written by
us using SDK with bulk inserts.
The client (written for load testing) is generating JSON format and send it
to Elasticsearch for further processing.

We run the same test with Logstash which reads the JSON format from Apache
access log on a general purpose SSD and managed to achieve maximum of 4,000
requests per second.
With 2 Logstash servers we achieved 8,000 req per second.

Getting rid of the filtering section in logstash configuration file helped
us get to this number. with filtering we achieved only 1,5000-2,000 req per
sec.
I also tried to move the log file to Ephemeral storage but didn't get any
improvement.
We don't have any resources problem in the Logstash server (I/O / CPU) so
it seems like a limit in the file input module or either logstash itself.

I was able to test Logstash performance by creating huge log file 2GB and
starting Logstash to send it's content.
I also did tried with smaller files (4-5MB each) but performance didn't get
any better.

Does it sound reasonable for you guys that I got to a limit of 4,000 req
per second with one Logstash?
If you have any suggestions of how to proceed from here I will be more than
happy to hear that.

If we can't get more from one Logstash, we'll have to develop our own Java
service to do that instead.

Apache Access log file output example - (already in JSON Format):
{ "timestamp":"2015-02-09T10:07:48+0000",
"bq_timestamp":"2015-02-09T10:07:48", "client_ip":"52.2.11.111",
"client_port":"80", "latency_ms":"57", "latency_sec":"0",
"elb_status_code":"200",
"request":"/il.html?e=fpAdOpportunity&w=wfl_dose&vid=1&vname=compName_PM&ecpm=8&adid=1814157&media_file_type=MEDIA_FILE_TYPE&media_file_url=MEDIA_FILE_URL&current_url=%0Ahttp%3A%2F%2Fu-sd.gga.tv%2Fa%2Fh%2FJvf82UX3%2Beff48Z%2fwU20swbapQoWau_%3Fcb%3D5605126933660359000%26pet%3Dpreroll%26pageUrl%3Dhttp%253A%252F%252F3ffese.com%26eov%3Deov%0A%09&current_main_vast_url=MAIN_VAST_URL&error_code=ERROR_CODE&error_message=ERROR_MESSAGE&q9=dsdase.com&apid=dose.com&d=Convert&device=6719&csize=300X250&token=14123669&cb=260174713417&pc=PLAYCOUNT",
"request_path":"/il.html", "referer":"-", "user_agent":"Mozilla/5.0
(redhat-x86_64-linux-gnu) Siege/3.0.8" }

Logstash configuration file (for the testing I ran it with root without
any limitations):

input {
file {
path =>
"/var/log/httpd/aaa.ddddd.com.logstash-acc.log.[0-9]*"
codec => json
type => "tracking"
discover_interval => 1
sincedb_path => "/opt/logstash/httpd-sincedb"
sincedb_write_interval => 1
}
}

output {
elasticsearch {
workers => 1
host => "aaa.dddd.com"
index => "%{request_path}-logstash-%{+YYYY-MM-dd}"
flush_size => 1000
cluster => "video"
codec => json
}
}

Your help is appreciated!
Thanks!

On Thursday, February 5, 2015 at 1:56:34 PM UTC+2, Itamar Syn-Hershko wrote:

I'd recommend you use ephemeral SSD - 2+ factor replicas and proper use of
the snapshot/restore API will provide you HA and DR guarantees.

The rejections you are seeing are due to slow I/O operations, because the
disk is not local. There is a way to have a bigger queue but I'd advise
against that and instead go with a local fast disk.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Lucene.NET committer and PMC member

On Thu, Feb 5, 2015 at 1:51 PM, Hagai T <haga...@gmail.com <javascript:>>
wrote:

Hi Itamar, thank you for the reply.

This is 15k inserts totally and not for one host in the cluster.
Yes, we have 15 sharding whiting one index. shards are spreaded on the
nodes equally (automatically by Elasticsearch cluster).
We currently use general purpose SSD and not Ephemeral storage.

In addition, I see a lot of thread pool bulk rejections from the
Elasticsearch side.

On Thursday, February 5, 2015 at 1:33:05 PM UTC+2, Itamar Syn-Hershko
wrote:

What is the question?...

15k inserts per sec per node is actually quite nice.

Are your index sharded? If you write to one index only, you write to
maximum of x nodes where x is the number of shards of that index. Since
shards of the same index can co-exist on one node, check if you are
spanning writes.

Use local disks - never EBS, and if you really care about writing speeds
use SSDs.

Other than that, Mike did an excellent write up on the subject :
Elasticsearch Platform — Find real-time answers at scale | Elastic
considerations-elasticsearch-indexing/

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Lucene.NET committer and PMC member

On Thu, Feb 5, 2015 at 1:26 PM, Hagai T haga...@gmail.com wrote:

Hi Guys,

We use Elasticsearch as our tracking system of our products in a
dynamic to track performance.
Searching of this data is being used by small group of users (10-12) in
the company to measure performance.

In our current environment, we see limit of 15,000 documents being
inserted per second without the ability to scale.

Some information on the current setup and the flow:

- Tracking servers
8 x Apache servers behind Amazon ELB which serve empty html files so it
tracks the parameters given and writes it to Apache access log.
on each server, we also have Logstash which configured to read this
access log file and send data to Elasticsearch cluster.

*- Elasticsearch Cluster: *
4 x r3.2xlarge (61.0 GB RAM, 8 cores) - contains one Elasticsearch
process - 30GB Heap size
1 x r3.4xlarge (122.0 GB RAM, 16 cores) - contains two Elasticsearch
processes each with 30GB Heap size.

Additional information on the Cluster:
Additional information · GitHub

Cluster health: Cluster Health · GitHub
file-gistfile1-txt

Logstash Configuration: https://gist.github.com/
hagait/23f4b2bc614a4c4acbb6

Elasticsearch configuration: hagait’s gists · GitHub
ba3684048abe2f9219b8

Thank you for the support!
Regards,
Hagai

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/9c5352ed-010d-4080-9794-b8dc0c2a5370%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/9c5352ed-010d-4080-9794-b8dc0c2a5370%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/318b46ed-626f-4975-a417-c99ada9c30fe%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/318b46ed-626f-4975-a417-c99ada9c30fe%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/75c6a8db-87ab-432b-9974-55cfc44f139f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Logstash is CPU bound, SSD won't help. It's a JRuby implementation. Try to
see if you can have multiple logstash shippers on the same logs. Having a
redis / kafka server as a middle tier is also a general practice. If that
is not feasible then yes - my advise to you would be to roll your own.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Lucene.NET committer and PMC member

On Mon, Feb 9, 2015 at 12:15 PM, Hagai T hagai.tob@gmail.com wrote:

Hi,

We were able to identify the bottleneck which seems to be the Logstash
service.
It seems that the Elasticsearch cluster is able to handle 40,000 per
second documents with a 3 ES servers cluster using Java client that was
written by us using SDK with bulk inserts.
The client (written for load testing) is generating JSON format and send
it to Elasticsearch for further processing.

We run the same test with Logstash which reads the JSON format from Apache
access log on a general purpose SSD and managed to achieve maximum of 4,000
requests per second.
With 2 Logstash servers we achieved 8,000 req per second.

Getting rid of the filtering section in logstash configuration file helped
us get to this number. with filtering we achieved only 1,5000-2,000 req per
sec.
I also tried to move the log file to Ephemeral storage but didn't get any
improvement.
We don't have any resources problem in the Logstash server (I/O / CPU) so
it seems like a limit in the file input module or either logstash itself.

I was able to test Logstash performance by creating huge log file 2GB and
starting Logstash to send it's content.
I also did tried with smaller files (4-5MB each) but performance didn't
get any better.

Does it sound reasonable for you guys that I got to a limit of 4,000 req
per second with one Logstash?
If you have any suggestions of how to proceed from here I will be more
than happy to hear that.

If we can't get more from one Logstash, we'll have to develop our own Java
service to do that instead.

Apache Access log file output example - (already in JSON Format):
{ "timestamp":"2015-02-09T10:07:48+0000",
"bq_timestamp":"2015-02-09T10:07:48", "client_ip":"52.2.11.111",
"client_port":"80", "latency_ms":"57", "latency_sec":"0",
"elb_status_code":"200",
"request":"/il.html?e=fpAdOpportunity&w=wfl_dose&vid=1&vname=compName_PM&ecpm=8&adid=1814157&media_file_type=MEDIA_FILE_TYPE&media_file_url=MEDIA_FILE_URL&current_url=%0Ahttp%3A%2F%
2Fu-sd.gga.tv
%2Fa%2Fh%2FJvf82UX3%2Beff48Z%2fwU20swbapQoWau_%3Fcb%3D5605126933660359000%26pet%3Dpreroll%26pageUrl%3Dhttp%253A%252F%252F3ffese.com%26eov%3Deov%0A%09&current_main_vast_url=MAIN_VAST_URL&error_code=ERROR_CODE&error_message=ERROR_MESSAGE&q9=
dsdase.com&apid=dose.com&d=Convert&device=6719&csize=300X250&token=14123669&cb=260174713417&pc=PLAYCOUNT",
"request_path":"/il.html", "referer":"-", "user_agent":"Mozilla/5.0
(redhat-x86_64-linux-gnu) Siege/3.0.8" }

Logstash configuration file (for the testing I ran it with root without
any limitations):

input {
file {
path =>
"/var/log/httpd/aaa.ddddd.com.logstash-acc.log.[0-9]*"
codec => json
type => "tracking"
discover_interval => 1
sincedb_path => "/opt/logstash/httpd-sincedb"
sincedb_write_interval => 1
}
}

output {
elasticsearch {
workers => 1
host => "aaa.dddd.com"
index => "%{request_path}-logstash-%{+YYYY-MM-dd}"
flush_size => 1000
cluster => "video"
codec => json
}
}

Your help is appreciated!
Thanks!

On Thursday, February 5, 2015 at 1:56:34 PM UTC+2, Itamar Syn-Hershko
wrote:

I'd recommend you use ephemeral SSD - 2+ factor replicas and proper use
of the snapshot/restore API will provide you HA and DR guarantees.

The rejections you are seeing are due to slow I/O operations, because the
disk is not local. There is a way to have a bigger queue but I'd advise
against that and instead go with a local fast disk.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Lucene.NET committer and PMC member

On Thu, Feb 5, 2015 at 1:51 PM, Hagai T haga...@gmail.com wrote:

Hi Itamar, thank you for the reply.

This is 15k inserts totally and not for one host in the cluster.
Yes, we have 15 sharding whiting one index. shards are spreaded on the
nodes equally (automatically by Elasticsearch cluster).
We currently use general purpose SSD and not Ephemeral storage.

In addition, I see a lot of thread pool bulk rejections from the
Elasticsearch side.

On Thursday, February 5, 2015 at 1:33:05 PM UTC+2, Itamar Syn-Hershko
wrote:

What is the question?...

15k inserts per sec per node is actually quite nice.

Are your index sharded? If you write to one index only, you write to
maximum of x nodes where x is the number of shards of that index. Since
shards of the same index can co-exist on one node, check if you are
spanning writes.

Use local disks - never EBS, and if you really care about writing
speeds use SSDs.

Other than that, Mike did an excellent write up on the subject :
Elasticsearch Platform — Find real-time answers at scale | Elastic
ns-elasticsearch-indexing/

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Lucene.NET committer and PMC member

On Thu, Feb 5, 2015 at 1:26 PM, Hagai T haga...@gmail.com wrote:

Hi Guys,

We use Elasticsearch as our tracking system of our products in a
dynamic to track performance.
Searching of this data is being used by small group of users (10-12)
in the company to measure performance.

In our current environment, we see limit of 15,000 documents being
inserted per second without the ability to scale.

Some information on the current setup and the flow:

- Tracking servers
8 x Apache servers behind Amazon ELB which serve empty html files so
it tracks the parameters given and writes it to Apache access log.
on each server, we also have Logstash which configured to read this
access log file and send data to Elasticsearch cluster.

*- Elasticsearch Cluster: *
4 x r3.2xlarge (61.0 GB RAM, 8 cores) - contains one Elasticsearch
process - 30GB Heap size
1 x r3.4xlarge (122.0 GB RAM, 16 cores) - contains two Elasticsearch
processes each with 30GB Heap size.

Additional information on the Cluster:
Additional information · GitHub

Cluster health: Cluster Health · GitHub
file-gistfile1-txt

Logstash Configuration: hagait’s gists · GitHub
/23f4b2bc614a4c4acbb6

Elasticsearch configuration: hagait’s gists · GitHub
ba3684048abe2f9219b8

Thank you for the support!
Regards,
Hagai

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/9c5352ed-010d-4080-9794-b8dc0c2a5370%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/9c5352ed-010d-4080-9794-b8dc0c2a5370%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/318b46ed-626f-4975-a417-c99ada9c30fe%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/318b46ed-626f-4975-a417-c99ada9c30fe%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/75c6a8db-87ab-432b-9974-55cfc44f139f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/75c6a8db-87ab-432b-9974-55cfc44f139f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHTr4Zt4iLntP7Te3zX0ophQ7OXZe%2BvzLUyXBbpELNgkv3nVeQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Itamar,

In my case it doesn't seems to be a CPU bottleneck at all.
The instance is actually doing nothing and this is what I don't understand.
I tried to tweak some of the configurations like workers but more than 4
doesn't seem to improve anything.

Maybe I am missing something?

On Monday, February 9, 2015 at 12:22:28 PM UTC+2, Itamar Syn-Hershko wrote:

Logstash is CPU bound, SSD won't help. It's a JRuby implementation. Try to
see if you can have multiple logstash shippers on the same logs. Having a
redis / kafka server as a middle tier is also a general practice. If that
is not feasible then yes - my advise to you would be to roll your own.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Lucene.NET committer and PMC member

On Mon, Feb 9, 2015 at 12:15 PM, Hagai T <haga...@gmail.com <javascript:>>
wrote:

Hi,

We were able to identify the bottleneck which seems to be the Logstash
service.
It seems that the Elasticsearch cluster is able to handle 40,000 per
second documents with a 3 ES servers cluster using Java client that was
written by us using SDK with bulk inserts.
The client (written for load testing) is generating JSON format and send
it to Elasticsearch for further processing.

We run the same test with Logstash which reads the JSON format from
Apache access log on a general purpose SSD and managed to achieve maximum
of 4,000 requests per second.
With 2 Logstash servers we achieved 8,000 req per second.

Getting rid of the filtering section in logstash configuration file
helped us get to this number. with filtering we achieved only 1,5000-2,000
req per sec.
I also tried to move the log file to Ephemeral storage but didn't get any
improvement.
We don't have any resources problem in the Logstash server (I/O / CPU) so
it seems like a limit in the file input module or either logstash itself.

I was able to test Logstash performance by creating huge log file 2GB and
starting Logstash to send it's content.
I also did tried with smaller files (4-5MB each) but performance didn't
get any better.

Does it sound reasonable for you guys that I got to a limit of 4,000 req
per second with one Logstash?
If you have any suggestions of how to proceed from here I will be more
than happy to hear that.

If we can't get more from one Logstash, we'll have to develop our own
Java service to do that instead.

Apache Access log file output example - (already in JSON Format):
{ "timestamp":"2015-02-09T10:07:48+0000",
"bq_timestamp":"2015-02-09T10:07:48", "client_ip":"52.2.11.111",
"client_port":"80", "latency_ms":"57", "latency_sec":"0",
"elb_status_code":"200",
"request":"/il.html?e=fpAdOpportunity&w=wfl_dose&vid=1&vname=compName_PM&ecpm=8&adid=1814157&media_file_type=MEDIA_FILE_TYPE&media_file_url=MEDIA_FILE_URL&current_url=%0Ahttp%3A%2F%
2Fu-sd.gga.tv
%2Fa%2Fh%2FJvf82UX3%2Beff48Z%2fwU20swbapQoWau_%3Fcb%3D5605126933660359000%26pet%3Dpreroll%26pageUrl%3Dhttp%253A%252F%252F3ffese.com%26eov%3Deov%0A%09&current_main_vast_url=MAIN_VAST_URL&error_code=ERROR_CODE&error_message=ERROR_MESSAGE&q9=
dsdase.com&apid=dose.com&d=Convert&device=6719&csize=300X250&token=14123669&cb=260174713417&pc=PLAYCOUNT",
"request_path":"/il.html", "referer":"-", "user_agent":"Mozilla/5.0
(redhat-x86_64-linux-gnu) Siege/3.0.8" }

Logstash configuration file (for the testing I ran it with root without
any limitations):

input {
file {
path =>
"/var/log/httpd/aaa.ddddd.com.logstash-acc.log.[0-9]*"
codec => json
type => "tracking"
discover_interval => 1
sincedb_path => "/opt/logstash/httpd-sincedb"
sincedb_write_interval => 1
}
}

output {
elasticsearch {
workers => 1
host => "aaa.dddd.com"
index => "%{request_path}-logstash-%{+YYYY-MM-dd}"
flush_size => 1000
cluster => "video"
codec => json
}
}

Your help is appreciated!
Thanks!

On Thursday, February 5, 2015 at 1:56:34 PM UTC+2, Itamar Syn-Hershko
wrote:

I'd recommend you use ephemeral SSD - 2+ factor replicas and proper use
of the snapshot/restore API will provide you HA and DR guarantees.

The rejections you are seeing are due to slow I/O operations, because
the disk is not local. There is a way to have a bigger queue but I'd advise
against that and instead go with a local fast disk.

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Lucene.NET committer and PMC member

On Thu, Feb 5, 2015 at 1:51 PM, Hagai T haga...@gmail.com wrote:

Hi Itamar, thank you for the reply.

This is 15k inserts totally and not for one host in the cluster.
Yes, we have 15 sharding whiting one index. shards are spreaded on the
nodes equally (automatically by Elasticsearch cluster).
We currently use general purpose SSD and not Ephemeral storage.

In addition, I see a lot of thread pool bulk rejections from the
Elasticsearch side.

On Thursday, February 5, 2015 at 1:33:05 PM UTC+2, Itamar Syn-Hershko
wrote:

What is the question?...

15k inserts per sec per node is actually quite nice.

Are your index sharded? If you write to one index only, you write to
maximum of x nodes where x is the number of shards of that index. Since
shards of the same index can co-exist on one node, check if you are
spanning writes.

Use local disks - never EBS, and if you really care about writing
speeds use SSDs.

Other than that, Mike did an excellent write up on the subject :
Elasticsearch Platform — Find real-time answers at scale | Elastic
ns-elasticsearch-indexing/

--

Itamar Syn-Hershko
http://code972.com | @synhershko https://twitter.com/synhershko
Freelance Developer & Consultant
Lucene.NET committer and PMC member

On Thu, Feb 5, 2015 at 1:26 PM, Hagai T haga...@gmail.com wrote:

Hi Guys,

We use Elasticsearch as our tracking system of our products in a
dynamic to track performance.
Searching of this data is being used by small group of users (10-12)
in the company to measure performance.

In our current environment, we see limit of 15,000 documents being
inserted per second without the ability to scale.

Some information on the current setup and the flow:

- Tracking servers
8 x Apache servers behind Amazon ELB which serve empty html files so
it tracks the parameters given and writes it to Apache access log.
on each server, we also have Logstash which configured to read this
access log file and send data to Elasticsearch cluster.

*- Elasticsearch Cluster: *
4 x r3.2xlarge (61.0 GB RAM, 8 cores) - contains one Elasticsearch
process - 30GB Heap size
1 x r3.4xlarge (122.0 GB RAM, 16 cores) - contains two Elasticsearch
processes each with 30GB Heap size.

Additional information on the Cluster:
Additional information · GitHub
gistfile1-txt

Cluster health: Cluster Health · GitHub
file-gistfile1-txt

Logstash Configuration: hagait’s gists · GitHub
/23f4b2bc614a4c4acbb6

Elasticsearch configuration: hagait’s gists · GitHub
ba3684048abe2f9219b8

Thank you for the support!
Regards,
Hagai

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/9c5352ed-010d-4080-9794-b8dc0c2a5370%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/9c5352ed-010d-4080-9794-b8dc0c2a5370%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/318b46ed-626f-4975-a417-c99ada9c30fe%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/318b46ed-626f-4975-a417-c99ada9c30fe%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/75c6a8db-87ab-432b-9974-55cfc44f139f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/75c6a8db-87ab-432b-9974-55cfc44f139f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b571469f-6725-48f6-ae63-ce8918599d22%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.