What's limiting my Elasticsearch?

Karol_Stojek · February 22, 2016, 1:35pm

Hello!

I'm trying to make my Elasticsearch work faster.
I have a logstash that's processing 17k/s events, when using null output.
If I set Elasticsearch as my output I'm able to get 6k/s only.

The Elasticsearch 1.7.1 is currently a single node on a separate server.
It's 16 cores, 64 GB of RAM + SSD disk.

I can see that the CPU utilisation is quite low - around 40%.
The IO is also nothing for SSD - 20MB/s of writes.
The heap space is set to 30GB, so it shouldn't be a problem too.

Does anyone have an idea how can I check what's limiting my Elasticsearch?
I've tried to set more workers on logstash elasticsearch output - no significant change observed.

Christian_Dahlqvist · February 22, 2016, 2:55pm

Have you followed the following optimization guidelines, especially around segments and merging? What does your Elasticsearch configuration look like?

Karol_Stojek · February 22, 2016, 3:29pm

Ooops... looks like I missed the
index.store.throttle.type: none
switch... It would make perfect sense, as I could see that the IO was never above 20MB/s... and that's the default value of throttling.

I'll test it and let you know.
Thanks!

Karol_Stojek · February 23, 2016, 1:16pm

Ok... looks like setting index.store.throttle to none helped a little.

But still the elasticsearch isn't consuming the load generated by logstash.
The logstash generated around 17k events/s
I'm sending them using http mode to a single elasticsearch using 4 workers.

The Elasticsearch configuration is as follows:

cluster.name: MD-test

index.number_of_shards: 1
index.number_of_replicas: 0

path.logs: /opt/MD/logs/monitoring

discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["127.0.0.1", "10.141.51.19:9300"]
#10.141.51.19:9300 is unavailable for now 

index.store.throttle.type: none

I get an overall throughput of 8.5k - 9k events/s.

Mem: 65972480k total, 38879532k used, 27092948k free, 182728k buffers
The CPU usage is around 25%
The disk IO is:

writes: 20-40MB/s
reads: 0
cancelled writes: 10-20 MB/s

What's limiting me now? How can I check it?
Didn't install Marvel, as this machine doesn't have internet access, but I can workaround it if needed.

rusty · February 23, 2016, 5:31pm

Hi, try to increase number of shards to 4 (number of workers).

Single ES node can easily handle 20-25 events/sec but it depends on many factors. Maybe you have a lot of fields or some analyzers or doc values in your mapping?

Karol_Stojek · February 24, 2016, 1:32pm

Hello Rusty,

increasing number of shards didn't help much.
I could get 10.000 events per second. Now it's 11.500.

The cpu usage is around 30-40%.

I was thinking about increasing the number of indexing threads.
As documentation says - it should be equal to number of CPU cores by default, but in node stats I can see:

  "thread_pool" : {
    "index" : {
      "threads" : 0,
      "queue" : 0,
      "active" : 0,
      "rejected" : 0,
      "largest" : 0,
      "completed" : 0
    },

Looks strange...

rusty · February 24, 2016, 2:23pm

Would you share your index mapping?

As for thread look at thread_pool.bulk you can monitor thread pool activity with this statement:

watch -d -n 5 "curl -s localhost:9200/_cat/thread_pool | sort"

You can also play with processors setting.

Karol_Stojek · February 24, 2016, 2:34pm

Sure, that's my mapping:

{
  "template" : "logstash-*",
  "settings" : {
    "index.refresh_interval" : "5s"
  },
  "mappings" : {
    "_default_" : {
       "_all" : {"enabled" : true, "omit_norms" : true},
       "dynamic_templates" : [ {
         "message_field" : {
           "match" : "message",
           "match_mapping_type" : "string",
           "mapping" : {
             "type" : "string", "index" : "analyzed", "omit_norms" : true
           }
         }
       }, {
         "string_fields" : {
           "match" : "*",
           "match_mapping_type" : "string",
           "mapping" : {
             "type" : "string", "index" : "not_analyzed", "omit_norms" : true, "store" : true
           }
         }
       } ],
       "properties" : {
         "@version": { "type": "string", "index": "not_analyzed" },
         "geoip"  : {
           "type" : "object",
             "dynamic": true,
             "properties" : {
               "location" : { "type" : "geo_point" }
             }
         }
       }
    }
  }
}

Karol_Stojek · February 24, 2016, 4:03pm

host            ip           bulk.active bulk.queue bulk.rejected index.active index.queue index.rejected search.active search.queue search.rejected 
XXXXX2BPR110V02 10.141.51.15           0          0             0            0           0              0             0            0               0

bulk.active varies from 0 to 16
bulk.queue is usually 0, but sometimes has some greater value (1, 1, 12)
The rest is 0.

rusty · February 24, 2016, 4:40pm

Firstly disable _all if don't need it. What about _source do you using it or not (if not try to disable it)?
Try to increase "index.refresh_interval" : "5s" to 30s.

For settings add:

"settings": {
"index.refresh_interval": "30s",
"index.codec.bloom.load": "false",
},

For not_analyzed strings try following (Do you really need store: true and all index_options as it is in your mapping template?):

"some_string_field": {
"norms": {
"enabled": false
},
"index": "not_analyzed",
"omit_norms": true,
"store": false,
"type": "string",
"index_options": "docs"
},

Karol_Stojek · February 24, 2016, 8:35pm

I've changed the refresh_interval, removed "store" option and _all field.
Btw. it turned out, that somehow the mapping wasn't applied, so I had the default configuration - all strings analyzed etc.
So after applying all these changes I couldn't see any difference in performance.

I've also performed some experiments with logstashes...
I was able to parse 7000 entries per second using one Logstash & one Elasticsearch.
I was able to parse 1100 entries per second using two Logstashes on single server & one Elasticsearch
I was able to parse 1400 entries per second using three Logstashes on single server & one Elasticsearch.

I couldn't see much difference between 1/2 logstashes on a server when using null output.
But now it looks like it makes a difference for Elasticsearch.
So I assume it's either a Logstash issue (Elasticsearch output limiting the Logstash performance) or Elasticsearch working better when more clients call him (but I've tried to tune the "workers" switch on logstash elasticsearch output - no improvement observed.)

I'll try to add more Logstashes tomorrow to see if Elasticsearch will handle more load.

rusty · February 25, 2016, 6:59am

It's only applied to new indexes, so you need to recreate old index (delete and reload your test data again).

Karol_Stojek · February 25, 2016, 1:14pm

Yes, I know... removed everyting, the mapping changes got applied
I'm waiting for additional servers to check if putting more pressure on Elasticsearch will make it sweat a little

Karol_Stojek · March 4, 2016, 2:30pm

Hmm... I've tried with 2 more elasticsearches on different servers...
The usage of Elasticsearch is still 25 - 30% and it's handling around 14000 entries per second.

I have no what's limiting the throughput...

rusty · March 7, 2016, 7:22am

Have you tried to start two independent logstash instance at once? Maybe you are limited by logstash output plugin not by elasticsearch itself?

For data rate 17k/sec is about 1 468 800 000 documents a day it a lot but can be handled by one es instace. But if you are planning to keep this data for a long period one instance is not sufficient.

Christian_Dahlqvist · March 7, 2016, 8:05am

Which version of Logstash are you using? What does your configuration look like? Have you tried using more than one Logstash instance on different hosts to ensure Logstash is not the bottleneck? Is CPU saturated on the Logstash host during indexing?

Karol_Stojek · March 7, 2016, 11:12am

Yes, that's what I've been waiting for - more servers to run Logstashes on... I've reached 14k per second using 3 Logstashes and adding more didn't make any difference.

I know one instance for elasticsearch is not enough - I just run on 1 to ease the maintenance and tests.

rusty · March 7, 2016, 11:27am

Can you share some sample data for you index and actual index mapping?

Karol_Stojek · March 7, 2016, 11:47am

I use Logstash 1.5.6.

the configuration is:

input {
  beats {
    port=>5000
  }
  beats {
    port=>5002
  }
}
filter {
  ...
}

output {
  if "metric" in [tags] {
    file {
      path => "/opt/logs/monitoring/metric-even-15.log"
    }
  } else {
    elasticsearch { 
      host => "10.141.51.15"
      port => "9200"
      protocol => "http"
      template => "/opt/monitoring/config/elasticsearch-template.json"
      template_overwrite => true  
      workers => 2
    }
  }
}

The CPU has high usage, but it surely has some spare cycles, so it's not Logstash limitation.

Sample data:

10.141.96.3 [07/Mar/2016:06:35:16 +0000] "POST /isAlive HTTP/1.0" 204 - 1 

20160307 16:52:48.437 744454d7-d659-480f-b3e0-76b8f57cdfe7 10.135.9.27 10.141.96.42 POST /session/invalidate null {"sso-token":"xxxxx","Content-Type":"application/json",,"lb-cookie":"341",,,"x-forwarded-for":"10.135.9.27",}

that's the majority of log entries.

Topic		Replies	Views
Bottleneck while inputting data into the elasticsearch Logstash	7	3343	December 29, 2016
Increasing elasticsearch indexing rate Elasticsearch	14	12902	March 9, 2017
Performance Limitation with ELK stack Elasticsearch	7	2893	July 6, 2017
Logstash elasticsearch output plugin takes extremely long time Logstash	5	1260	December 11, 2017
How to tune Logstash to Elasticsearch shipping Logstash	5	9603	July 6, 2017

What's limiting my Elasticsearch?

Related topics