Hi!
We are running ES 5.6.2 on an 8 core, 64GB (30 GB heap). We are using a SAN (yes, I know this is not recommended but let me continue). During the tests that we perform at the moment we use a single data node + separate master.
Yesterday I started to do some basic ingest performance tests with the settings below (tuning some of the parameters a little bit).
"index.number_of_replicas": 0,
"index.number_of_shards" : 1,
"index.refresh_interval" : "-1",
"index.translog.durability" : "async",
"index.translog.flush_threshold_size": "1gb"
PUT _cluster/settings { "transient" : { "indices.store.throttle.type" : "none" } } (reverted to merge before search)
PUT _cluster/settings { "persistent" : { "indices.store.throttle.max_bytes_per_sec" : "200mb" } }
This is what we are indexing
"_source": {
"@version": "1",
"@timestamp": "2017-10-18T15:18:24.595Z",
"message": "contents of an access log file",
"kafka": {
"topic": "perf_test",
"consumer_group": "perf_test",
"partition": 1,
"offset": 2165328553,
"key": null
}
}
Mapping
"mappings" : {
"_default_" : {
"_all" : {"enabled" : false, "norms" : false},
"dynamic_templates" : [ {
"message_field" : {
"path_match" : "message",
"match_mapping_type" : "string",
"mapping" : {
"type" : "text",
"norms" : false
}
}
}, {
"string_fields" : {
"match" : "*",
"match_mapping_type" : "string",
"mapping" : {
"type" : "text", "norms" : false
}
}
} ],
"properties" : {
"@timestamp": { "type": "date", "include_in_all": false },
"@version": { "type": "keyword", "include_in_all": false },
"geoip" : {
"dynamic": true,
"properties" : {
"ip": { "type": "ip" },
"location" : { "type" : "geo_point" },
"latitude" : { "type" : "half_float" },
"longitude" : { "type" : "half_float" }
}
}
}
}
}
We had an avg throughput of 55.000 docs per second which i think is quite ok. The index now contains 364 million records accross two hours. I was not able to track down any weird I/O related things during indexing so the SAN seems to work ok for now.
The problem: After this I went to kibana to just have a look the the result but I kept getting timeouts until I increased the timeout a lot. The default kibana (Discovery page, 24 hours, query=*) takes over 130 seconds and the CPU (one core) goes up to 100%. During this time we have no other queries and no indexing is performed. If I start a new query the CPU goes to 200% but the response times ("took":time_ms) are the same.
This is the query that kibana uses.
{
"version": true,
"size": 500,
"sort": [
{
"@timestamp": {
"order": "desc",
"unmapped_type": "boolean"
}
}
],
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "*",
"analyze_wildcard": true
}
},
{
"range": {
"@timestamp": {
"gte": 1508316014801,
"lte": 1508402414801,
"format": "epoch_millis"
}
}
}
],
"must_not": []
}
},
"_source": {
"excludes": []
},
"aggs": {
"2": {
"date_histogram": {
"field": "@timestamp",
"interval": "30m",
"time_zone": "Europe/Berlin",
"min_doc_count": 1
}
}
},
"stored_fields": [
"*"
],
"script_fields": {},
"docvalue_fields": [
"@timestamp"
],
"highlight": {
"pre_tags": [
"@kibana-highlighted-field@"
],
"post_tags": [
"@/kibana-highlighted-field@"
],
"fields": {
"*": {
"highlight_query": {
"bool": {
"must": [
{
"query_string": {
"query": "*",
"analyze_wildcard": true,
"all_fields": true
}
},
{
"range": {
"@timestamp": {
"gte": 1508316014801,
"lte": 1508402414801,
"format": "epoch_millis"
}
}
}
],
"must_not": []
}
}
}
},
"fragment_size": 2147483647
}
}
Remove aggregations: If I remove the aggs2 part of the same query it takes approx 7 seconds to perform this query which is of course much faster but still sort of long time.
Remove timezone: If I keep the aggs2 but remove the timezone part it goes from 130 secs down to 26 seconds. I still think this is way to long but I am not sure what to expect here.
The question: Is this expected behaviour? I was expecting this to be a bit faster and not consume this amount of cpu. System load is low and I cannot find any I/O issues during search either. Or is 300 million docs to many to work with even with this simple query?
Any suggestions are highly appreciated.
Regards /Johan