Logstash reporting error sending to Elasticsearch

Try upgrading the output plugin and see if that helps.

How can I check its current version?

https://www.elastic.co/guide/en/logstash/current/working-with-plugins.html#listing-plugins will show you

logstash-output-elasticsearch (5.4.0)

Updated logstash-output-elasticsearch 5.4.0 to 6.2.4

And now I can't get Logstash to start!

Gem::LoadError: You have already activated json 1.8.6, but your Gemfile requires json 1.8.3. Prepending `bundle exec` to your command may solve this.
    setup at /usr/share/logstash/vendor/bundle/jruby/1.9/gems/bundler-1.9.10/lib/bundler/runtime.rb:34
     each at org/jruby/RubyArray.java:1613
     each at /usr/share/logstash/vendor/jruby/lib/ruby/1.9/forwardable.rb:201
    setup at /usr/share/logstash/vendor/bundle/jruby/1.9/gems/bundler-1.9.10/lib/bundler/runtime.rb:19
    setup at /usr/share/logstash/vendor/bundle/jruby/1.9/gems/bundler-1.9.10/lib/bundler.rb:122
   setup! at /usr/share/logstash/lib/bootstrap/bundler.rb:64
  execute at /usr/share/logstash/lib/pluginmanager/list.rb:17
      run at /usr/share/logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/command.rb:67
  execute at /usr/share/logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/subcommand/execution.rb:11
      run at /usr/share/logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/command.rb:67
      run at /usr/share/logstash/vendor/bundle/jruby/1.9/gems/clamp-0.6.5/lib/clamp/command.rb:132
   (root) at /usr/share/logstash/lib/pluginmanager/main.rb:44

Ok,
Found an old topic for someone with 2.2 who had a similar problem.

This definitely needs to be better documented. If need to update the GEM, whatever that is, run the:

sudo /usr/share/logstash/bin/logstash-plugin update

command.

Ok, STILL having problems!

The logstash-output-elasticsearch plugin has been updated to 6.2.4 on all Logstash shippers.

I have added TWO more data nodes, bringing the total to 6 data nodes as well as patched up version 6.2.0. Here are the latest health stats:

{
  "cluster_name" : "CLUSTER",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 10,
  "number_of_data_nodes" : 6,
  "active_primary_shards" : 13,
  "active_shards" : 14,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

master name            version node.role disk.avail heap.max ram.max ram.current cpu uptime jdk
-      DATANODE-01     5.2.0   di             1.8tb   29.9gb  39.2gb      38.1gb  23   9.1h 1.8.0_121
-      DATANODE-02     5.2.0   di             1.8tb   29.9gb  39.2gb        39gb  20   9.1h 1.8.0_121
-      DATANODE-03     5.2.0   di             1.8tb   29.9gb  39.2gb        38gb  23   9.1h 1.8.0_121
-      DATANODE-04     5.2.0   di             1.8tb   29.9gb  39.2gb      38.7gb  23   9.1h 1.8.0_121
-      DATANODE-05     5.2.0   di             1.8tb   29.9gb  39.2gb      38.2gb  19     9h 1.8.0_121
-      DATANODE-06     5.2.0   di             1.8tb   29.9gb  39.2gb        39gb  23     9h 1.8.0_121
master name            indexing.index_total indexing.index_current indexing.index_failed indexing.delete_total
-      DATANODE-01     9927300              2                      0                     0
-      DATANODE-02     10010815             0                      0                     0
-      DATANODE-03     10030249             1                      0                     0
-      DATANODE-04     10038413             2                      0                     0
-      DATANODE-05     9910437              1                      0                     0
-      DATANODE-06     10022208             1                      0                     0
master name            merges.total merges.current merges.total_size merges.current_size
-      DATANODE-01     2600         0              59.4gb            0b
-      DATANODE-02     2474         2              59.7gb            5gb
-      DATANODE-03     2632         0              60.3gb            0b
-      DATANODE-04     2631         1              59.8gb            4.9gb
-      DATANODE-05     2503         1              59gb              4.9gb
-      DATANODE-06     2591         2              59.3gb            5gb
master name            search.fetch_total search.fetch_current search.query_total search.query_current
-      DATANODE-01     79                 0                    926                0
-      DATANODE-02     81                 0                    926                0
-      DATANODE-03     82                 0                    926                0
-      DATANODE-04     80                 0                    926                0
-      DATANODE-05     6518               0                    7362               0
-      DATANODE-06     6523               0                    7371               0

health status index               uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   logstash-2017.01.31 YaJbI3DbTHCyazLZ2mXsVw   6   0    2724607            0        2gb            2gb
green  open   logstash-2017.02.01 UXJXqMxlQeWPhjdlWImVhA   6   0   57241514            0    101.9gb        101.9gb

index               shard prirep state      docs   store  node
logstash-2017.01.31 3     p      STARTED  456419 352.9mb  DATANODE-03
logstash-2017.01.31 5     p      STARTED  454425   350mb  DATANODE-06
logstash-2017.01.31 4     p      STARTED  451017   347mb  DATANODE-05
logstash-2017.01.31 1     p      STARTED  451950   347mb  DATANODE-01
logstash-2017.01.31 2     p      STARTED  454766 349.3mb  DATANODE-02
logstash-2017.01.31 0     p      STARTED  456030 352.1mb  DATANODE-04
logstash-2017.02.01 3     p      STARTED 9581297  16.4gb  DATANODE-03
logstash-2017.02.01 5     p      STARTED 9574838  18.4gb  DATANODE-06
logstash-2017.02.01 4     p      STARTED 9468455  16.2gb  DATANODE-05
logstash-2017.02.01 1     p      STARTED 9481363  16.4gb  DATANODE-01
logstash-2017.02.01 2     p      STARTED 9564405  17.7gb  DATANODE-02
logstash-2017.02.01 0     p      STARTED 9590903  16.5gb  DATANODE-04

shards disk.indices disk.used disk.avail disk.total disk.percent   node
     2       16.8gb   119.7gb      1.8tb      1.9tb            5   DATANODE-01
     2       18.1gb   121.1gb      1.8tb      1.9tb            6   DATANODE-02
     2       16.8gb   119.8gb      1.8tb      1.9tb            5   DATANODE-03
     2       16.9gb   119.4gb      1.8tb      1.9tb            5   DATANODE-04
     3       16.6gb   119.5gb      1.8tb      1.9tb            5   DATANODE-05
     3       18.8gb   121.8gb      1.8tb      1.9tb            6   DATANODE-06

This is what I am seeing from the Logstash shippers;

[2017-02-01T10:17:50,450][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 429 ({"type"=>"es_rejected_execution_exception", "reason"=>"rejected execution of org.elasticsearch.transport.TransportService$7@2c4dede1 on EsThreadPoolExecutor[bulk, queue capacity = 50, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@62dfb351[Running, pool size = 4, active threads = 4, queued tasks = 50, completed tasks = 1783100]]"})

I also bumped ALL data nodes to 40 gig of RAM and set the Elasticsearch jvm.options to 30 gig.

Hi @kopacko,
I'm also experiencing the issue on my production environment. While doing some research, I came across this link https://github.com/logstash-plugins/logstash-output-elasticsearch/issues/515.

My guess is that this issue is related to the version of logstash (5.x) that we are using. Because when I was using logstash 2.x, I didn't had this issue. Well, I cannot downgrade my logstash because my whole infrastructure is compactible with logstash 5.0.

With this issue, I cannot garantee that all data that are sent to logstash are actually index on elasticsearch.

In addition to that my logstash is way too slow. The stats reports that logstash is taking 17881576 milliseconds( ~298 minutes) to process 109129 events.

Any clues please (apart from downgrading logstash)?

Just came across this link https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-indexing-speed.html#_use_multiple_workers_threads_to_send_data_to_elasticsearch, I will be testing as per the requirements of the link.

My error messages are related to Elasticsearch 429s, which means ES is rejecting the logs coming from my Logstash shippers.

I built a HTTP_POLLER input to monitor the clusters and getting some good logs in terms of rejections.

But, like I said, my issue is not Logstash, but ES 429's.

Currently, my refresh interval is set to 5second.

So I changed it to 10 seconds, based on the link you posted. But I don't know how to get that to reflect when I pull the settings.

I am not sure how to verify if the change worked but I changed the indices.memory.index_buffer_size setting my elasticsearch.yml file and restated the elasticsearch service on all 6 data nodes. As you can see below, sorted by rejected, I am STILL having issues.

node_name     name      type    queue_size min max active rejected completed
DATANODE-06   bulk      fixed           50   4   4      1      306     44926
DATANODE-03   bulk      fixed           50   4   4      1      399     45149
DATANODE-01   bulk      fixed           50   4   4      3      423     62188
DATANODE-05   bulk      fixed           50   4   4      0      472     44975
DATANODE-04   bulk      fixed           50   4   4      2      725     44321
DATANODE-02   bulk      fixed           50   4   4      1     2455     50117

Hey @kopacko,

Indeed the error was ES and mine too. Is reducing the shards number per index an option because on my own case, I just reduced the shards number and everything was good. (No downgrade was needed on logstash)

Hi,

I have been following your thread because I have the same issue. When I upgraded, I was warned that my number of shards was greater than 1000. Would this effect the ingestion? Is that what you are getting at?

Would you mind sharing your HTTP_POLLER with the stats you show in the thread below.

Yea, that is a tonne of shards and will be a huge impact on performance.

I did that as well.

I also rebuilt my entire cluster from scratch and dumped my old data.

Here are the current stats on my cluster (as of a few minutes ago):

$ curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
  "cluster_name" : "CLUSTER",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 10,
  "number_of_data_nodes" : 6,
  "active_primary_shards" : 31,
  "active_shards" : 31,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

$ curl -XGET 'http://localhost:9200/_cat/count?v'
count
416827938

$ curl -XGET 'http://localhost:9200/_cat/nodes?v&s=name&h=master,name,version,node.role,disk.avail,heap.max,ram.max,ram.current,cpu,uptime,jdk'

master name           version role disk.avail heap.max ram.max ram.current cpu
-      DATANODE-01    5.2.0   di    1.7tb     29.9gb   39.2gb  39gb        25
-      DATANODE-02    5.2.0   di    1.7tb     29.9gb   39.2gb  39gb        29
-      DATANODE-03    5.2.0   di    1.7tb     29.9gb   39.2gb  38.9gb      42
-      DATANODE-04    5.2.0   di    1.7tb     29.9gb   39.2gb  35.2gb      24
-      DATANODE-05    5.2.0   di    1.7tb     29.9gb   39.2gb  35.5gb      21
-      DATANODE-06    5.2.0   di    1.7tb     29.9gb   39.2gb  38.9gb      29
-      KIBANANODE-01  5.2.0   -    38.8gb     11.9gb   15.6gb  14.5gb      0
-      MASTERNODE-01  5.2.0   m    39.3gb     15.9gb   23.5gb  18.2gb      0
-      MASTERNODE-02  5.2.0   m    39.3gb     15.9gb   23.5gb  18.1gb      0
*      MASTERNODE-03  5.2.0   m    39.3gb     15.9gb   23.5gb  18.2gb      0

$ curl -XGET 'http://localhost:9200/_cat/nodes?v&s=name&h=master,name,indexing.index_total,indexing.index_current,indexing.index_failed,indexing.delete_total'
master name            indexing.index_total indexing.index_current indexing.index_failed indexing.delete_total
-      DATANODE-01     25912059                      0                     0                     0
-      DATANODE-02     25291723                      0                     0                     0
-      DATANODE-03     25810299                      0                     0                     0
-      DATANODE-04     25801970                      0                     0                     0
-      DATANODE-05     25841652                      0                     0                     0
-      DATANODE-06     25733829                      0                     0                     0

$ curl -XGET 'http://localhost:9200/_cat/indices?v&s=index'
health status index               pri rep docs.count docs.deleted store.size pri.store.size
green  open   logstash-2017.01.30 6   0         3716            0      4.1mb          4.1mb
green  open   logstash-2017.01.31 6   0      2782506            0      2.1gb          2.1gb
green  open   logstash-2017.02.01 6   0    144779115            0    240.5gb        240.5gb
green  open   logstash-2017.02.02 6   0    140247069            0    239.6gb        239.6gb
green  open   logstash-2017.02.03 6   0    129062868            0      251gb          251gb

$ curl -XGET 'http://localhost:9200/_cat/shards?v&s=index'
index               shard prirep state       docs   store node
logstash-2017.01.30 5     p      STARTED      640 589.3kb DATANODE-05
logstash-2017.01.30 4     p      STARTED      606 643.5kb DATANODE-03
logstash-2017.01.30 3     p      STARTED      679 830.9kb DATANODE-06
logstash-2017.01.30 1     p      STARTED      623 689.5kb DATANODE-01
logstash-2017.01.30 2     p      STARTED      546 861.2kb DATANODE-02
logstash-2017.01.30 0     p      STARTED      622 655.6kb DATANODE-04
logstash-2017.01.31 5     p      STARTED   464104 360.5mb DATANODE-06
logstash-2017.01.31 4     p      STARTED   460865 357.6mb DATANODE-05
logstash-2017.01.31 3     p      STARTED   466080 362.7mb DATANODE-03
logstash-2017.01.31 1     p      STARTED   461635 357.2mb DATANODE-01
logstash-2017.01.31 2     p      STARTED   463858 358.5mb DATANODE-02
logstash-2017.01.31 0     p      STARTED   465964 361.9mb DATANODE-04
logstash-2017.02.01 5     p      STARTED 24152653    40gb DATANODE-06
logstash-2017.02.01 4     p      STARTED 24048034  39.9gb DATANODE-05
logstash-2017.02.01 3     p      STARTED 24157379  40.1gb DATANODE-03
logstash-2017.02.01 1     p      STARTED 24055200    40gb DATANODE-01
logstash-2017.02.01 2     p      STARTED 24165058    40gb DATANODE-02
logstash-2017.02.01 0     p      STARTED 24200791  40.2gb DATANODE-04
logstash-2017.02.02 5     p      STARTED 23556488  40.3gb DATANODE-06
logstash-2017.02.02 4     p      STARTED 23627977  40.3gb DATANODE-05
logstash-2017.02.02 3     p      STARTED 23618891  40.3gb DATANODE-03
logstash-2017.02.02 1     p      STARTED 23614856  40.1gb DATANODE-01
logstash-2017.02.02 2     p      STARTED 22224017    38gb DATANODE-02
logstash-2017.02.02 0     p      STARTED 23604840  40.3gb DATANODE-04
logstash-2017.02.03 5     p      STARTED 21488905  43.2gb DATANODE-06
logstash-2017.02.03 4     p      STARTED 21522607  41.5gb DATANODE-03
logstash-2017.02.03 3     p      STARTED 21412625  43.2gb DATANODE-02
logstash-2017.02.03 1     p      STARTED 21540043  39.9gb DATANODE-04
logstash-2017.02.03 2     p      STARTED 21556369  42.9gb DATANODE-01
logstash-2017.02.03 0     p      STARTED 21550709  40.3gb DATANODE-05

$ curl -XGET 'http://localhost:9200/_cat/thread_pool/bulk?v&s=rejected&h=node_name,name,size,active,queue,rejected'
node_name       name size active queue rejected
DATANODE-05     bulk    4      1     0    27383
DATANODE-01     bulk    4      2     0    28703
DATANODE-04     bulk    4      0     0    35137
DATANODE-03     bulk    4      1     0    36731
DATANODE-06     bulk    4      4    33    51714
DATANODE-02     bulk    4      0     0   170626