Poor Query Performance Results

groot · April 29, 2018, 12:41pm

Hi ,

I need advice ,

I have build in my company a new ELK environment with 5 nodes (vm machines) which one is Coordinator node with kibana And the other nodes are data nodes with 6 shards per index .

Between our coordinator node and data node I have configured stunnel,
When I'm sending a query to my ELK with curl command I'm get the 1st results from our elasticsearch in 47s , the 2nd results from the same query are better and its stable to 8-10s each.

My hosts configuration is:

4 CPU with 43g Memory and JVM Xms&Xmx is 16g

Any idea why?

here is my coordinator configuration:

cluster.name: elkcentral
node.name: elkcoordinator
path.data: /data
path.logs: /logs

transport.bind_host: local
transport.publish_port: 9900
http.bind_host: global
http.port: 9200
node.master: false
node.data: false
node.ingest: false
action.destructive_requires_name: true

processors: 4
thread_pool.search.size: 6

network.tcp.keep_alive: true
transport.ping_schedule: 5s

http.cors.enabled: true
http.cors.allow-origin: "*"

discovery.zen.ping.unicast.hosts: ["localhost:9900", "localhost:9901", "localhost:9902", "localhost:9903","localhost:9904"]
discovery.zen.minimum_master_nodes: 2
gateway.recover_after_nodes: 4

path.repo: ["/elk/snapshots"]
http.type: ssl_netty4

thiago · April 30, 2018, 8:39am

Are you running all the 6 nodes in vm machines within a single host with only 4 CPU and a total of 43GB RAM?

Or is it that each vm has 4 CPU and 43GB? If its the case, what is total CPU cores and RAM of this host?

Christian_Dahlqvist · April 30, 2018, 9:01am

Could you provide the output of the cluster stats API so we can get a better understanding of what the cluster looks like? You mention using tunnel between the coordinating node and the rest of the nodes. Could you please elaborate on how the cluster is deployed and what load it is under?

groot · April 30, 2018, 2:15pm

Hi @thiago,

Thanks for the replay,
Regarding your question I have 5 hosts and not 6
each host has 41g memory and 4 cpu.

Total Cores 5 host * 4 Core = 20 cpu total
Total Memory 5 Host * 41G = 205g memory total

thiago · April 30, 2018, 3:59pm

That should not be a problem then. Can you please attach what Christian has requested?

groot · May 1, 2018, 11:18am

Hi @Christian_Dahlqvist ,

The connection is secured with stunnel
here is the stunnel conf example

[es-http-local-server]
client = no
accept = 19200
connect = localhost:9200
CAfile = /etc/ssl/certs/elk_certificate.pem
verify = 2

[es-transport-co-ord-client]
client = yes
accept = localhost:9900
connect = elk:19300

[es-transport-node01-client]
client = yes
accept = localhost:9901
connect = elkdp01:19300

[es-transport-node02-client]
client = yes
accept = localhost:9902
connect = elkdp02:19300
....

The coordinator node is listen to network connection for rsyslog, logstash and kibana connection
While most of the servers sending data via rsyslog service to our rsyslog server and then forward it to my coordinator,
when query any thing it's direct to my coordinator node,

here is the output from my elastic cluster

{
"_nodes" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"cluster_name" : "elk_centralized_logger",
"timestamp" : 1525097008481,
"status" : "green",
"indices" : {
"count" : 136,
"shards" : {
"total" : 1280,
"primaries" : 640,
"replication" : 1.0,
"index" : {
"shards" : {
"min" : 2,
"max" : 12,
"avg" : 9.411764705882353
},
"primaries" : {
"min" : 1,
"max" : 6,
"avg" : 4.705882352941177
},
"replication" : {
"min" : 1.0,
"max" : 1.0,
"avg" : 1.0
}
}
},
"docs" : {
"count" : 1143442501,
"deleted" : 59054
},
"store" : {
"size" : "720.3gb",
"size_in_bytes" : 773488810938,
"throttle_time" : "0s",
"throttle_time_in_millis" : 0
},
"fielddata" : {
"memory_size" : "0b",
"memory_size_in_bytes" : 0,
"evictions" : 0
},
"query_cache" : {
"memory_size" : "938.4mb",
"memory_size_in_bytes" : 984046371,
"total_count" : 421545,
"hit_count" : 95625,
"miss_count" : 325920,
"cache_size" : 35898,
"cache_count" : 36867,
"evictions" : 969
},
"completion" : {
"size" : "0b",
"size_in_bytes" : 0
},
"segments" : {
"count" : 13381,
"memory" : "2.3gb",
"memory_in_bytes" : 2535474517,
"terms_memory" : "1.9gb",
"terms_memory_in_bytes" : 2089938623,
"stored_fields_memory" : "321.9mb",
"stored_fields_memory_in_bytes" : 337610384,
"term_vectors_memory" : "0b",
"term_vectors_memory_in_bytes" : 0,
"norms_memory" : "4.6mb",
"norms_memory_in_bytes" : 4874752,
"points_memory" : "42.1mb",
"points_memory_in_bytes" : 44205474,
"doc_values_memory" : "56.1mb",
"doc_values_memory_in_bytes" : 58845284,
"index_writer_memory" : "5.7mb",
"index_writer_memory_in_bytes" : 6008504,
"version_map_memory" : "903b",
"version_map_memory_in_bytes" : 903,
"fixed_bit_set" : "271.6mb",
"fixed_bit_set_memory_in_bytes" : 284844488,
"max_unsafe_auto_id_timestamp" : 9223372036854775807,
"file_sizes" : { }
}
},
"nodes" : {
"count" : {
"total" : 5,
"data" : 4,
"coordinating_only" : 1,
"master" : 3,
"ingest" : 4
},
"versions" : [
"5.6.5"
],
"os" : {
"available_processors" : 20,
"allocated_processors" : 20,
"names" : [
{
"name" : "Linux",
"count" : 5
}
],
"mem" : {
"total" : "205.5gb",
"total_in_bytes" : 220687183872,
"free" : "4gb",
"free_in_bytes" : 4298481664,
"used" : "201.5gb",
"used_in_bytes" : 216388702208,
"free_percent" : 2,
"used_percent" : 98
}
},
"process" : {
"cpu" : {
"percent" : 17
},
"open_file_descriptors" : {
"min" : 324,
"max" : 1104,
"avg" : 843
}
},
"jvm" : {
"max_uptime" : "39d",
"max_uptime_in_millis" : 3374293868,
"versions" : [
{
"version" : "1.8.0_161",
"vm_name" : "Java HotSpot(TM) 64-Bit Server VM",
"vm_version" : "25.161-b12",
"vm_vendor" : "Oracle Corporation",
"count" : 5
}
],
"mem" : {
"heap_used" : "43.3gb",
"heap_used_in_bytes" : 46522328000,
"heap_max" : "79.8gb",
"heap_max_in_bytes" : 85725020160
},
"threads" : 342
},
"fs" : {
"total" : "787.3gb",
"total_in_bytes" : 845377593344,
"free" : "590.5gb",
"free_in_bytes" : 634127507456,
"available" : "550.5gb",
"available_in_bytes" : 591161057280,
"spins" : "true"
},
"plugins" : [],
"network_types" : {
"transport_types" : {
"netty4" : 5
},
"http_types" : {
"ssl_netty4" : 1,
"netty4" : 4
}
}
}
}

hope that I did not forgot anything,

Christian_Dahlqvist · May 1, 2018, 11:40am

What type of queries are you running that are taking that long?

How much are you indexing into the cluster each day?

system · May 29, 2018, 11:40am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch slow, crash Elasticsearch	3	361	May 22, 2019
Performance issues that make no sense Kibana	10	1250	July 28, 2017
Elasticsearch cluster performance looks not enough? Elasticsearch	5	389	July 12, 2021
Very slow response in cluster Elasticsearch	4	1602	October 11, 2017
Cluster from virtual machines Elasticsearch	5	774	July 5, 2017

Poor Query Performance Results

Related topics