Slow search in Kibana / ElasticSearch

When trying to visualize data, it takes a lot of time to get even the most recent data. I'm trying to figure out what's wrong with our setup that causes this:

  • 3 nodes (1 hot, 2 cold)
  • Hourly indices, 1 shard per index (peak was ~15.2 GB, ~200 million docs per index)
  • Indices move to cold nodes after 4 days. No warm nodes.
  • Heap/ram/disk:
  • No rollover. The index name is calculated in the app that puts the data in ES.

Any tips?

It sounds like you are indexing a lot of data every day into just a single node while you have a good amount of data in the cluster. If CPU and GC seems fine I would first look at disk I/O and iowait. Given the indexing volume it is possible that you do not have enough headroom to support fast queries. Does the latency vary if you only query a time period stored on the cold nodes vs a similar time period residing on the hot node?

The output from the _cluster/stats?pretty&human API might also be helpful.

The output from the _cluster/stats?pretty&human API:

{
	"_nodes": {
		"total": 3,
		"successful": 3,
		"failed": 0
	},
	"cluster_name": "inventory-cluster",
	"cluster_uuid": "N9PS1MEjRmGISyldE_6_8Q",
	"timestamp": 1645524563027,
	"status": "green",
	"indices": {
		"count": 796,
		"shards": {
			"total": 824,
			"primaries": 796,
			"replication": 0.035175879396984924,
			"index": {
				"shards": {
					"min": 1,
					"max": 2,
					"avg": 1.035175879396985
				},
				"primaries": {
					"min": 1,
					"max": 1,
					"avg": 1
				},
				"replication": {
					"min": 0,
					"max": 1,
					"avg": 0.035175879396984924
				}
			}
		},
		"docs": {
			"count": 84453688947,
			"deleted": 9491242
		},
		"store": {
			"size": "5.6tb",
			"size_in_bytes": 6199679418854,
			"total_data_set_size": "5.6tb",
			"total_data_set_size_in_bytes": 6199679418854,
			"reserved": "0b",
			"reserved_in_bytes": 0
		},
		"fielddata": {
			"memory_size": "17.6mb",
			"memory_size_in_bytes": 18495488,
			"evictions": 0
		},
		"query_cache": {
			"memory_size": "326.9mb",
			"memory_size_in_bytes": 342810617,
			"total_count": 61172823,
			"hit_count": 2115707,
			"miss_count": 59057116,
			"cache_size": 24828,
			"cache_count": 74215,
			"evictions": 49387
		},
		"completion": {
			"size": "0b",
			"size_in_bytes": 0
		},
		"segments": {
			"count": 21061,
			"memory": "167.8mb",
			"memory_in_bytes": 175954904,
			"terms_memory": "99.3mb",
			"terms_memory_in_bytes": 104161256,
			"stored_fields_memory": "21.3mb",
			"stored_fields_memory_in_bytes": 22417096,
			"term_vectors_memory": "0b",
			"term_vectors_memory_in_bytes": 0,
			"norms_memory": "8.7mb",
			"norms_memory_in_bytes": 9208896,
			"points_memory": "0b",
			"points_memory_in_bytes": 0,
			"doc_values_memory": "38.3mb",
			"doc_values_memory_in_bytes": 40167656,
			"index_writer_memory": "393.4mb",
			"index_writer_memory_in_bytes": 412588444,
			"version_map_memory": "36.1mb",
			"version_map_memory_in_bytes": 37899248,
			"fixed_bit_set": "11.7mb",
			"fixed_bit_set_memory_in_bytes": 12332168,
			"max_unsafe_auto_id_timestamp": 1645524427496,
			"file_sizes": {}
		},
		"mappings": {
			"field_types": [
				{
					"name": "boolean",
					"count": 47,
					"index_count": 18,
					"script_count": 0
				},
				{
					"name": "date",
					"count": 849,
					"index_count": 780,
					"script_count": 0
				},
				{
					"name": "float",
					"count": 76,
					"index_count": 10,
					"script_count": 0
				},
				{
					"name": "half_float",
					"count": 56,
					"index_count": 14,
					"script_count": 0
				},
				{
					"name": "integer",
					"count": 154,
					"index_count": 7,
					"script_count": 0
				},
				{
					"name": "keyword",
					"count": 5899,
					"index_count": 781,
					"script_count": 0
				},
				{
					"name": "long",
					"count": 2018,
					"index_count": 781,
					"script_count": 0
				},
				{
					"name": "nested",
					"count": 24,
					"index_count": 10,
					"script_count": 0
				},
				{
					"name": "object",
					"count": 821,
					"index_count": 20,
					"script_count": 0
				},
				{
					"name": "text",
					"count": 5365,
					"index_count": 774,
					"script_count": 0
				}
			],
			"runtime_field_types": []
		},
		"analysis": {
			"char_filter_types": [],
			"tokenizer_types": [],
			"filter_types": [],
			"analyzer_types": [],
			"built_in_char_filters": [],
			"built_in_tokenizers": [],
			"built_in_filters": [],
			"built_in_analyzers": []
		},
		"versions": [
			{
				"version": "7.14.1",
				"index_count": 796,
				"primary_shard_count": 796,
				"total_primary_size": "5.6tb",
				"total_primary_bytes": 6178724612548
			}
		]
	},
	"nodes": {
		"count": {
			"total": 3,
			"coordinating_only": 0,
			"data": 3,
			"data_cold": 3,
			"data_content": 3,
			"data_frozen": 3,
			"data_hot": 3,
			"data_warm": 3,
			"ingest": 3,
			"master": 3,
			"ml": 3,
			"remote_cluster_client": 3,
			"transform": 3,
			"voting_only": 0
		},
		"versions": [
			"7.14.1"
		],
		"os": {
			"available_processors": 56,
			"allocated_processors": 56,
			"names": [
				{
					"name": "Linux",
					"count": 3
				}
			],
			"pretty_names": [
				{
					"pretty_name": "Debian GNU/Linux 10 (buster)",
					"count": 3
				}
			],
			"architectures": [
				{
					"arch": "amd64",
					"count": 3
				}
			],
			"mem": {
				"total": "251.5gb",
				"total_in_bytes": 270079356928,
				"free": "10.6gb",
				"free_in_bytes": 11472879616,
				"used": "240.8gb",
				"used_in_bytes": 258606477312,
				"free_percent": 4,
				"used_percent": 96
			}
		},
		"process": {
			"cpu": {
				"percent": 67
			},
			"open_file_descriptors": {
				"min": 1758,
				"max": 4505,
				"avg": 3543
			}
		},
		"jvm": {
			"max_uptime": "66.9d",
			"max_uptime_in_millis": 5783387936,
			"versions": [
				{
					"version": "16.0.2",
					"vm_name": "OpenJDK 64-Bit Server VM",
					"vm_version": "16.0.2+7",
					"vm_vendor": "Eclipse Foundation",
					"bundled_jdk": true,
					"using_bundled_jdk": true,
					"count": 3
				}
			],
			"mem": {
				"heap_used": "107.1gb",
				"heap_used_in_bytes": 115055536344,
				"heap_max": "180gb",
				"heap_max_in_bytes": 193273528320
			},
			"threads": 510
		},
		"fs": {
			"total": "15.2tb",
			"total_in_bytes": 16727295205376,
			"free": "9.5tb",
			"free_in_bytes": 10495911034880,
			"available": "8.7tb",
			"available_in_bytes": 9645852045312
		},
		"plugins": [],
		"network_types": {
			"transport_types": {
				"security4": 3
			},
			"http_types": {
				"security4": 3
			}
		},
		"discovery_types": {
			"zen": 3
		},
		"packaging_types": [
			{
				"flavor": "default",
				"type": "deb",
				"count": 3
			}
		],
		"ingest": {
			"number_of_pipelines": 2,
			"processor_stats": {
				"gsub": {
					"count": 0,
					"failed": 0,
					"current": 0,
					"time": "0s",
					"time_in_millis": 0
				},
				"script": {
					"count": 0,
					"failed": 0,
					"current": 0,
					"time": "0s",
					"time_in_millis": 0
				}
			}
		}
	}
}

iowait% is at 1-3, but sometimes spikes to 10-20 when doing queries for 2 days worth of data

Is your heap set to 60GB? That's pretty unusual.

I'm thinking of lowering heap to 32GB since it's recommended. Someone probably set it that high because they had no idea about the recommendations.

BTW, is our model of hourly indices with 1 shard bad? I suspect that this is far from ideal, but we only have 1 hot node, so 3+ shards per index would not work better, or would it?

It's not ideal, your shard size seems to average 7GB which is going to be wasting resources.

Look at using ILM.

I'm thinking of setting up a rollover with a 50GB shard limit. Would that help my slow search problem?

If iowait is high your problem is that storage is too slow compared to the load you expext the cluster to support. I doubt increasing shard size will resolve that.

But iowait isn't that high. The server has several SSDs.

10-20 sounds kind of high. Just to have a common reference, can you run iostat -x on the nodes while you are running a slow query so we can see what the actual full output is?

It need to be lower than 32GB in order for compressed pointers to be used. Ensure the use of compressed pointers is logged when you start up Elasticsearch.

1 Like

Just found out about Index Sorting. Would that be helpful for optimizing search speed (by timestamp)?

Added a data stream + rollover on reaching 50gb + sorting by @timestamp. Still the same search speed =(

I realised that I have to clarify that I'm speaking of Kibana Lens Visualization time, instead of query time. Visualization takes more 3 minutes when visualizing data from 1 day. Could that actually be an issue with Kibana?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.