Poor Query Performance Results

Hi ,

I need advice ,

I have build in my company a new ELK environment with 5 nodes (vm machines) which one is Coordinator node with kibana And the other nodes are data nodes with 6 shards per index .

Between our coordinator node and data node I have configured stunnel,
When I'm sending a query to my ELK with curl command I'm get the 1st results from our elasticsearch in 47s , the 2nd results from the same query are better and its stable to 8-10s each.

My hosts configuration is:

4 CPU with 43g Memory and JVM Xms&Xmx is 16g

Any idea why?

here is my coordinator configuration:

cluster.name: elkcentral
node.name: elkcoordinator
path.data: /data
path.logs: /logs

transport.bind_host: local
transport.publish_port: 9900
http.bind_host: global
http.port: 9200
node.master: false
node.data: false
node.ingest: false
action.destructive_requires_name: true

processors: 4
thread_pool.search.size: 6

network.tcp.keep_alive: true
transport.ping_schedule: 5s

http.cors.enabled: true
http.cors.allow-origin: "*"

discovery.zen.ping.unicast.hosts: ["localhost:9900", "localhost:9901", "localhost:9902", "localhost:9903","localhost:9904"]
discovery.zen.minimum_master_nodes: 2
gateway.recover_after_nodes: 4

path.repo: ["/elk/snapshots"]
http.type: ssl_netty4

Are you running all the 6 nodes in vm machines within a single host with only 4 CPU and a total of 43GB RAM?

Or is it that each vm has 4 CPU and 43GB? If its the case, what is total CPU cores and RAM of this host?

Could you provide the output of the cluster stats API so we can get a better understanding of what the cluster looks like? You mention using tunnel between the coordinating node and the rest of the nodes. Could you please elaborate on how the cluster is deployed and what load it is under?

Hi @thiago,

Thanks for the replay,
Regarding your question I have 5 hosts and not 6
each host has 41g memory and 4 cpu.

Total Cores 5 host * 4 Core = 20 cpu total
Total Memory 5 Host * 41G = 205g memory total

That should not be a problem then. Can you please attach what Christian has requested?

Hi @Christian_Dahlqvist ,

The connection is secured with stunnel
here is the stunnel conf example :slight_smile:

[es-http-local-server]
client = no
accept = 19200
connect = localhost:9200
CAfile = /etc/ssl/certs/elk_certificate.pem
verify = 2

[es-transport-co-ord-client]
client = yes
accept = localhost:9900
connect = elk:19300

[es-transport-node01-client]
client = yes
accept = localhost:9901
connect = elkdp01:19300

[es-transport-node02-client]
client = yes
accept = localhost:9902
connect = elkdp02:19300
....

The coordinator node is listen to network connection for rsyslog, logstash and kibana connection
While most of the servers sending data via rsyslog service to our rsyslog server and then forward it to my coordinator,
when query any thing it's direct to my coordinator node,

here is the output from my elastic cluster

{
"_nodes" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"cluster_name" : "elk_centralized_logger",
"timestamp" : 1525097008481,
"status" : "green",
"indices" : {
"count" : 136,
"shards" : {
"total" : 1280,
"primaries" : 640,
"replication" : 1.0,
"index" : {
"shards" : {
"min" : 2,
"max" : 12,
"avg" : 9.411764705882353
},
"primaries" : {
"min" : 1,
"max" : 6,
"avg" : 4.705882352941177
},
"replication" : {
"min" : 1.0,
"max" : 1.0,
"avg" : 1.0
}
}
},
"docs" : {
"count" : 1143442501,
"deleted" : 59054
},
"store" : {
"size" : "720.3gb",
"size_in_bytes" : 773488810938,
"throttle_time" : "0s",
"throttle_time_in_millis" : 0
},
"fielddata" : {
"memory_size" : "0b",
"memory_size_in_bytes" : 0,
"evictions" : 0
},
"query_cache" : {
"memory_size" : "938.4mb",
"memory_size_in_bytes" : 984046371,
"total_count" : 421545,
"hit_count" : 95625,
"miss_count" : 325920,
"cache_size" : 35898,
"cache_count" : 36867,
"evictions" : 969
},
"completion" : {
"size" : "0b",
"size_in_bytes" : 0
},
"segments" : {
"count" : 13381,
"memory" : "2.3gb",
"memory_in_bytes" : 2535474517,
"terms_memory" : "1.9gb",
"terms_memory_in_bytes" : 2089938623,
"stored_fields_memory" : "321.9mb",
"stored_fields_memory_in_bytes" : 337610384,
"term_vectors_memory" : "0b",
"term_vectors_memory_in_bytes" : 0,
"norms_memory" : "4.6mb",
"norms_memory_in_bytes" : 4874752,
"points_memory" : "42.1mb",
"points_memory_in_bytes" : 44205474,
"doc_values_memory" : "56.1mb",
"doc_values_memory_in_bytes" : 58845284,
"index_writer_memory" : "5.7mb",
"index_writer_memory_in_bytes" : 6008504,
"version_map_memory" : "903b",
"version_map_memory_in_bytes" : 903,
"fixed_bit_set" : "271.6mb",
"fixed_bit_set_memory_in_bytes" : 284844488,
"max_unsafe_auto_id_timestamp" : 9223372036854775807,
"file_sizes" : { }
}
},
"nodes" : {
"count" : {
"total" : 5,
"data" : 4,
"coordinating_only" : 1,
"master" : 3,
"ingest" : 4
},
"versions" : [
"5.6.5"
],
"os" : {
"available_processors" : 20,
"allocated_processors" : 20,
"names" : [
{
"name" : "Linux",
"count" : 5
}
],
"mem" : {
"total" : "205.5gb",
"total_in_bytes" : 220687183872,
"free" : "4gb",
"free_in_bytes" : 4298481664,
"used" : "201.5gb",
"used_in_bytes" : 216388702208,
"free_percent" : 2,
"used_percent" : 98
}
},
"process" : {
"cpu" : {
"percent" : 17
},
"open_file_descriptors" : {
"min" : 324,
"max" : 1104,
"avg" : 843
}
},
"jvm" : {
"max_uptime" : "39d",
"max_uptime_in_millis" : 3374293868,
"versions" : [
{
"version" : "1.8.0_161",
"vm_name" : "Java HotSpot(TM) 64-Bit Server VM",
"vm_version" : "25.161-b12",
"vm_vendor" : "Oracle Corporation",
"count" : 5
}
],
"mem" : {
"heap_used" : "43.3gb",
"heap_used_in_bytes" : 46522328000,
"heap_max" : "79.8gb",
"heap_max_in_bytes" : 85725020160
},
"threads" : 342
},
"fs" : {
"total" : "787.3gb",
"total_in_bytes" : 845377593344,
"free" : "590.5gb",
"free_in_bytes" : 634127507456,
"available" : "550.5gb",
"available_in_bytes" : 591161057280,
"spins" : "true"
},
"plugins" : [],
"network_types" : {
"transport_types" : {
"netty4" : 5
},
"http_types" : {
"ssl_netty4" : 1,
"netty4" : 4
}
}
}
}

hope that I did not forgot anything,

What type of queries are you running that are taking that long?

How much are you indexing into the cluster each day?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.