Try upgrading the output plugin and see if that helps.
How can I check its current version?
https://www.elastic.co/guide/en/logstash/current/working-with-plugins.html#listing-plugins will show you
logstash-output-elasticsearch (5.4.0)
Updated logstash-output-elasticsearch 5.4.0 to 6.2.4
And now I can't get Logstash to start!
Gem::LoadError: You have already activated json 1.8.6, but your Gemfile requires json 1.8.3. Prepending `bundle exec` to your command may solve this.
setup at /usr/share/logstash/vendor/bundle/jruby/1.9/gems/bundler-1.9.10/lib/bundler/runtime.rb:34
each at org/jruby/RubyArray.java:1613
each at /usr/share/logstash/vendor/jruby/lib/ruby/1.9/forwardable.rb:201
setup at /usr/share/logstash/vendor/bundle/jruby/1.9/gems/bundler-1.9.10/lib/bundler/runtime.rb:19
setup at /usr/share/logstash/vendor/bundle/jruby/1.9/gems/bundler-1.9.10/lib/bundler.rb:122
setup! at /usr/share/logstash/lib/bootstrap/bundler.rb:64
execute at /usr/share/logstash/lib/pluginmanager/list.rb:17
run at /usr/share/logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/command.rb:67
execute at /usr/share/logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/subcommand/execution.rb:11
run at /usr/share/logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/command.rb:67
run at /usr/share/logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/command.rb:132
(root) at /usr/share/logstash/lib/pluginmanager/main.rb:44
Ok,
Found an old topic for someone with 2.2 who had a similar problem.
This definitely needs to be better documented. If need to update the GEM, whatever that is, run the:
sudo /usr/share/logstash/bin/logstash-plugin update
command.
Ok, STILL having problems!
The logstash-output-elasticsearch plugin has been updated to 6.2.4 on all Logstash shippers.
I have added TWO more data nodes, bringing the total to 6 data nodes as well as patched up version 6.2.0. Here are the latest health stats:
{
"cluster_name" : "CLUSTER",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 10,
"number_of_data_nodes" : 6,
"active_primary_shards" : 13,
"active_shards" : 14,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
master name version node.role disk.avail heap.max ram.max ram.current cpu uptime jdk
- DATANODE-01 5.2.0 di 1.8tb 29.9gb 39.2gb 38.1gb 23 9.1h 1.8.0_121
- DATANODE-02 5.2.0 di 1.8tb 29.9gb 39.2gb 39gb 20 9.1h 1.8.0_121
- DATANODE-03 5.2.0 di 1.8tb 29.9gb 39.2gb 38gb 23 9.1h 1.8.0_121
- DATANODE-04 5.2.0 di 1.8tb 29.9gb 39.2gb 38.7gb 23 9.1h 1.8.0_121
- DATANODE-05 5.2.0 di 1.8tb 29.9gb 39.2gb 38.2gb 19 9h 1.8.0_121
- DATANODE-06 5.2.0 di 1.8tb 29.9gb 39.2gb 39gb 23 9h 1.8.0_121
master name indexing.index_total indexing.index_current indexing.index_failed indexing.delete_total
- DATANODE-01 9927300 2 0 0
- DATANODE-02 10010815 0 0 0
- DATANODE-03 10030249 1 0 0
- DATANODE-04 10038413 2 0 0
- DATANODE-05 9910437 1 0 0
- DATANODE-06 10022208 1 0 0
master name merges.total merges.current merges.total_size merges.current_size
- DATANODE-01 2600 0 59.4gb 0b
- DATANODE-02 2474 2 59.7gb 5gb
- DATANODE-03 2632 0 60.3gb 0b
- DATANODE-04 2631 1 59.8gb 4.9gb
- DATANODE-05 2503 1 59gb 4.9gb
- DATANODE-06 2591 2 59.3gb 5gb
master name search.fetch_total search.fetch_current search.query_total search.query_current
- DATANODE-01 79 0 926 0
- DATANODE-02 81 0 926 0
- DATANODE-03 82 0 926 0
- DATANODE-04 80 0 926 0
- DATANODE-05 6518 0 7362 0
- DATANODE-06 6523 0 7371 0
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open logstash-2017.01.31 YaJbI3DbTHCyazLZ2mXsVw 6 0 2724607 0 2gb 2gb
green open logstash-2017.02.01 UXJXqMxlQeWPhjdlWImVhA 6 0 57241514 0 101.9gb 101.9gb
index shard prirep state docs store node
logstash-2017.01.31 3 p STARTED 456419 352.9mb DATANODE-03
logstash-2017.01.31 5 p STARTED 454425 350mb DATANODE-06
logstash-2017.01.31 4 p STARTED 451017 347mb DATANODE-05
logstash-2017.01.31 1 p STARTED 451950 347mb DATANODE-01
logstash-2017.01.31 2 p STARTED 454766 349.3mb DATANODE-02
logstash-2017.01.31 0 p STARTED 456030 352.1mb DATANODE-04
logstash-2017.02.01 3 p STARTED 9581297 16.4gb DATANODE-03
logstash-2017.02.01 5 p STARTED 9574838 18.4gb DATANODE-06
logstash-2017.02.01 4 p STARTED 9468455 16.2gb DATANODE-05
logstash-2017.02.01 1 p STARTED 9481363 16.4gb DATANODE-01
logstash-2017.02.01 2 p STARTED 9564405 17.7gb DATANODE-02
logstash-2017.02.01 0 p STARTED 9590903 16.5gb DATANODE-04
shards disk.indices disk.used disk.avail disk.total disk.percent node
2 16.8gb 119.7gb 1.8tb 1.9tb 5 DATANODE-01
2 18.1gb 121.1gb 1.8tb 1.9tb 6 DATANODE-02
2 16.8gb 119.8gb 1.8tb 1.9tb 5 DATANODE-03
2 16.9gb 119.4gb 1.8tb 1.9tb 5 DATANODE-04
3 16.6gb 119.5gb 1.8tb 1.9tb 5 DATANODE-05
3 18.8gb 121.8gb 1.8tb 1.9tb 6 DATANODE-06
This is what I am seeing from the Logstash shippers;
[2017-02-01T10:17:50,450][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 429 ({"type"=>"es_rejected_execution_exception", "reason"=>"rejected execution of org.elasticsearch.transport.TransportService$7@2c4dede1 on EsThreadPoolExecutor[bulk, queue capacity = 50, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@62dfb351[Running, pool size = 4, active threads = 4, queued tasks = 50, completed tasks = 1783100]]"})
I also bumped ALL data nodes to 40 gig of RAM and set the Elasticsearch jvm.options to 30 gig.
Hi @kopacko,
I'm also experiencing the issue on my production environment. While doing some research, I came across this link https://github.com/logstash-plugins/logstash-output-elasticsearch/issues/515.
My guess is that this issue is related to the version of logstash (5.x) that we are using. Because when I was using logstash 2.x, I didn't had this issue. Well, I cannot downgrade my logstash because my whole infrastructure is compactible with logstash 5.0.
With this issue, I cannot garantee that all data that are sent to logstash are actually index on elasticsearch.
In addition to that my logstash is way too slow. The stats reports that logstash is taking 17881576 milliseconds( ~298 minutes) to process 109129 events.
Any clues please (apart from downgrading logstash)?
Just came across this link https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-indexing-speed.html#_use_multiple_workers_threads_to_send_data_to_elasticsearch, I will be testing as per the requirements of the link.
My error messages are related to Elasticsearch 429s, which means ES is rejecting the logs coming from my Logstash shippers.
I built a HTTP_POLLER input to monitor the clusters and getting some good logs in terms of rejections.
But, like I said, my issue is not Logstash, but ES 429's.
Currently, my refresh interval is set to 5second.
So I changed it to 10 seconds, based on the link you posted. But I don't know how to get that to reflect when I pull the settings.
I am not sure how to verify if the change worked but I changed the indices.memory.index_buffer_size
setting my elasticsearch.yml file and restated the elasticsearch service on all 6 data nodes. As you can see below, sorted by rejected, I am STILL having issues.
node_name name type queue_size min max active rejected completed
DATANODE-06 bulk fixed 50 4 4 1 306 44926
DATANODE-03 bulk fixed 50 4 4 1 399 45149
DATANODE-01 bulk fixed 50 4 4 3 423 62188
DATANODE-05 bulk fixed 50 4 4 0 472 44975
DATANODE-04 bulk fixed 50 4 4 2 725 44321
DATANODE-02 bulk fixed 50 4 4 1 2455 50117
Hey @kopacko,
Indeed the error was ES and mine too. Is reducing the shards number per index an option because on my own case, I just reduced the shards number and everything was good. (No downgrade was needed on logstash)
Hi,
I have been following your thread because I have the same issue. When I upgraded, I was warned that my number of shards was greater than 1000. Would this effect the ingestion? Is that what you are getting at?
Would you mind sharing your HTTP_POLLER with the stats you show in the thread below.
Yea, that is a tonne of shards and will be a huge impact on performance.
I did that as well.
I also rebuilt my entire cluster from scratch and dumped my old data.
Here are the current stats on my cluster (as of a few minutes ago):
$ curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "CLUSTER",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 10,
"number_of_data_nodes" : 6,
"active_primary_shards" : 31,
"active_shards" : 31,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
$ curl -XGET 'http://localhost:9200/_cat/count?v'
count
416827938
$ curl -XGET 'http://localhost:9200/_cat/nodes?v&s=name&h=master,name,version,node.role,disk.avail,heap.max,ram.max,ram.current,cpu,uptime,jdk'
master name version role disk.avail heap.max ram.max ram.current cpu
- DATANODE-01 5.2.0 di 1.7tb 29.9gb 39.2gb 39gb 25
- DATANODE-02 5.2.0 di 1.7tb 29.9gb 39.2gb 39gb 29
- DATANODE-03 5.2.0 di 1.7tb 29.9gb 39.2gb 38.9gb 42
- DATANODE-04 5.2.0 di 1.7tb 29.9gb 39.2gb 35.2gb 24
- DATANODE-05 5.2.0 di 1.7tb 29.9gb 39.2gb 35.5gb 21
- DATANODE-06 5.2.0 di 1.7tb 29.9gb 39.2gb 38.9gb 29
- KIBANANODE-01 5.2.0 - 38.8gb 11.9gb 15.6gb 14.5gb 0
- MASTERNODE-01 5.2.0 m 39.3gb 15.9gb 23.5gb 18.2gb 0
- MASTERNODE-02 5.2.0 m 39.3gb 15.9gb 23.5gb 18.1gb 0
* MASTERNODE-03 5.2.0 m 39.3gb 15.9gb 23.5gb 18.2gb 0
$ curl -XGET 'http://localhost:9200/_cat/nodes?v&s=name&h=master,name,indexing.index_total,indexing.index_current,indexing.index_failed,indexing.delete_total'
master name indexing.index_total indexing.index_current indexing.index_failed indexing.delete_total
- DATANODE-01 25912059 0 0 0
- DATANODE-02 25291723 0 0 0
- DATANODE-03 25810299 0 0 0
- DATANODE-04 25801970 0 0 0
- DATANODE-05 25841652 0 0 0
- DATANODE-06 25733829 0 0 0
$ curl -XGET 'http://localhost:9200/_cat/indices?v&s=index'
health status index pri rep docs.count docs.deleted store.size pri.store.size
green open logstash-2017.01.30 6 0 3716 0 4.1mb 4.1mb
green open logstash-2017.01.31 6 0 2782506 0 2.1gb 2.1gb
green open logstash-2017.02.01 6 0 144779115 0 240.5gb 240.5gb
green open logstash-2017.02.02 6 0 140247069 0 239.6gb 239.6gb
green open logstash-2017.02.03 6 0 129062868 0 251gb 251gb
$ curl -XGET 'http://localhost:9200/_cat/shards?v&s=index'
index shard prirep state docs store node
logstash-2017.01.30 5 p STARTED 640 589.3kb DATANODE-05
logstash-2017.01.30 4 p STARTED 606 643.5kb DATANODE-03
logstash-2017.01.30 3 p STARTED 679 830.9kb DATANODE-06
logstash-2017.01.30 1 p STARTED 623 689.5kb DATANODE-01
logstash-2017.01.30 2 p STARTED 546 861.2kb DATANODE-02
logstash-2017.01.30 0 p STARTED 622 655.6kb DATANODE-04
logstash-2017.01.31 5 p STARTED 464104 360.5mb DATANODE-06
logstash-2017.01.31 4 p STARTED 460865 357.6mb DATANODE-05
logstash-2017.01.31 3 p STARTED 466080 362.7mb DATANODE-03
logstash-2017.01.31 1 p STARTED 461635 357.2mb DATANODE-01
logstash-2017.01.31 2 p STARTED 463858 358.5mb DATANODE-02
logstash-2017.01.31 0 p STARTED 465964 361.9mb DATANODE-04
logstash-2017.02.01 5 p STARTED 24152653 40gb DATANODE-06
logstash-2017.02.01 4 p STARTED 24048034 39.9gb DATANODE-05
logstash-2017.02.01 3 p STARTED 24157379 40.1gb DATANODE-03
logstash-2017.02.01 1 p STARTED 24055200 40gb DATANODE-01
logstash-2017.02.01 2 p STARTED 24165058 40gb DATANODE-02
logstash-2017.02.01 0 p STARTED 24200791 40.2gb DATANODE-04
logstash-2017.02.02 5 p STARTED 23556488 40.3gb DATANODE-06
logstash-2017.02.02 4 p STARTED 23627977 40.3gb DATANODE-05
logstash-2017.02.02 3 p STARTED 23618891 40.3gb DATANODE-03
logstash-2017.02.02 1 p STARTED 23614856 40.1gb DATANODE-01
logstash-2017.02.02 2 p STARTED 22224017 38gb DATANODE-02
logstash-2017.02.02 0 p STARTED 23604840 40.3gb DATANODE-04
logstash-2017.02.03 5 p STARTED 21488905 43.2gb DATANODE-06
logstash-2017.02.03 4 p STARTED 21522607 41.5gb DATANODE-03
logstash-2017.02.03 3 p STARTED 21412625 43.2gb DATANODE-02
logstash-2017.02.03 1 p STARTED 21540043 39.9gb DATANODE-04
logstash-2017.02.03 2 p STARTED 21556369 42.9gb DATANODE-01
logstash-2017.02.03 0 p STARTED 21550709 40.3gb DATANODE-05
$ curl -XGET 'http://localhost:9200/_cat/thread_pool/bulk?v&s=rejected&h=node_name,name,size,active,queue,rejected'
node_name name size active queue rejected
DATANODE-05 bulk 4 1 0 27383
DATANODE-01 bulk 4 2 0 28703
DATANODE-04 bulk 4 0 0 35137
DATANODE-03 bulk 4 1 0 36731
DATANODE-06 bulk 4 4 33 51714
DATANODE-02 bulk 4 0 0 170626