Slow search in Kibana / ElasticSearch

notgurev · February 16, 2022, 12:19pm

When trying to visualize data, it takes a lot of time to get even the most recent data. I'm trying to figure out what's wrong with our setup that causes this:

3 nodes (1 hot, 2 cold)
Hourly indices, 1 shard per index (peak was ~15.2 GB, ~200 million docs per index)
Indices move to cold nodes after 4 days. No warm nodes.
Heap/ram/disk:

image1272×344 59.2 KB
No rollover. The index name is calculated in the app that puts the data in ES.

Any tips?

Christian_Dahlqvist · February 16, 2022, 2:16pm

It sounds like you are indexing a lot of data every day into just a single node while you have a good amount of data in the cluster. If CPU and GC seems fine I would first look at disk I/O and iowait. Given the indexing volume it is possible that you do not have enough headroom to support fast queries. Does the latency vary if you only query a time period stored on the cold nodes vs a similar time period residing on the hot node?

warkolm · February 17, 2022, 1:16am

The output from the _cluster/stats?pretty&human API might also be helpful.

notgurev · February 22, 2022, 10:14am

The output from the _cluster/stats?pretty&human API:

{
	"_nodes": {
		"total": 3,
		"successful": 3,
		"failed": 0
	},
	"cluster_name": "inventory-cluster",
	"cluster_uuid": "N9PS1MEjRmGISyldE_6_8Q",
	"timestamp": 1645524563027,
	"status": "green",
	"indices": {
		"count": 796,
		"shards": {
			"total": 824,
			"primaries": 796,
			"replication": 0.035175879396984924,
			"index": {
				"shards": {
					"min": 1,
					"max": 2,
					"avg": 1.035175879396985
				},
				"primaries": {
					"min": 1,
					"max": 1,
					"avg": 1
				},
				"replication": {
					"min": 0,
					"max": 1,
					"avg": 0.035175879396984924
				}
			}
		},
		"docs": {
			"count": 84453688947,
			"deleted": 9491242
		},
		"store": {
			"size": "5.6tb",
			"size_in_bytes": 6199679418854,
			"total_data_set_size": "5.6tb",
			"total_data_set_size_in_bytes": 6199679418854,
			"reserved": "0b",
			"reserved_in_bytes": 0
		},
		"fielddata": {
			"memory_size": "17.6mb",
			"memory_size_in_bytes": 18495488,
			"evictions": 0
		},
		"query_cache": {
			"memory_size": "326.9mb",
			"memory_size_in_bytes": 342810617,
			"total_count": 61172823,
			"hit_count": 2115707,
			"miss_count": 59057116,
			"cache_size": 24828,
			"cache_count": 74215,
			"evictions": 49387
		},
		"completion": {
			"size": "0b",
			"size_in_bytes": 0
		},
		"segments": {
			"count": 21061,
			"memory": "167.8mb",
			"memory_in_bytes": 175954904,
			"terms_memory": "99.3mb",
			"terms_memory_in_bytes": 104161256,
			"stored_fields_memory": "21.3mb",
			"stored_fields_memory_in_bytes": 22417096,
			"term_vectors_memory": "0b",
			"term_vectors_memory_in_bytes": 0,
			"norms_memory": "8.7mb",
			"norms_memory_in_bytes": 9208896,
			"points_memory": "0b",
			"points_memory_in_bytes": 0,
			"doc_values_memory": "38.3mb",
			"doc_values_memory_in_bytes": 40167656,
			"index_writer_memory": "393.4mb",
			"index_writer_memory_in_bytes": 412588444,
			"version_map_memory": "36.1mb",
			"version_map_memory_in_bytes": 37899248,
			"fixed_bit_set": "11.7mb",
			"fixed_bit_set_memory_in_bytes": 12332168,
			"max_unsafe_auto_id_timestamp": 1645524427496,
			"file_sizes": {}
		},
		"mappings": {
			"field_types": [
				{
					"name": "boolean",
					"count": 47,
					"index_count": 18,
					"script_count": 0
				},
				{
					"name": "date",
					"count": 849,
					"index_count": 780,
					"script_count": 0
				},
				{
					"name": "float",
					"count": 76,
					"index_count": 10,
					"script_count": 0
				},
				{
					"name": "half_float",
					"count": 56,
					"index_count": 14,
					"script_count": 0
				},
				{
					"name": "integer",
					"count": 154,
					"index_count": 7,
					"script_count": 0
				},
				{
					"name": "keyword",
					"count": 5899,
					"index_count": 781,
					"script_count": 0
				},
				{
					"name": "long",
					"count": 2018,
					"index_count": 781,
					"script_count": 0
				},
				{
					"name": "nested",
					"count": 24,
					"index_count": 10,
					"script_count": 0
				},
				{
					"name": "object",
					"count": 821,
					"index_count": 20,
					"script_count": 0
				},
				{
					"name": "text",
					"count": 5365,
					"index_count": 774,
					"script_count": 0
				}
			],
			"runtime_field_types": []
		},
		"analysis": {
			"char_filter_types": [],
			"tokenizer_types": [],
			"filter_types": [],
			"analyzer_types": [],
			"built_in_char_filters": [],
			"built_in_tokenizers": [],
			"built_in_filters": [],
			"built_in_analyzers": []
		},
		"versions": [
			{
				"version": "7.14.1",
				"index_count": 796,
				"primary_shard_count": 796,
				"total_primary_size": "5.6tb",
				"total_primary_bytes": 6178724612548
			}
		]
	},
	"nodes": {
		"count": {
			"total": 3,
			"coordinating_only": 0,
			"data": 3,
			"data_cold": 3,
			"data_content": 3,
			"data_frozen": 3,
			"data_hot": 3,
			"data_warm": 3,
			"ingest": 3,
			"master": 3,
			"ml": 3,
			"remote_cluster_client": 3,
			"transform": 3,
			"voting_only": 0
		},
		"versions": [
			"7.14.1"
		],
		"os": {
			"available_processors": 56,
			"allocated_processors": 56,
			"names": [
				{
					"name": "Linux",
					"count": 3
				}
			],
			"pretty_names": [
				{
					"pretty_name": "Debian GNU/Linux 10 (buster)",
					"count": 3
				}
			],
			"architectures": [
				{
					"arch": "amd64",
					"count": 3
				}
			],
			"mem": {
				"total": "251.5gb",
				"total_in_bytes": 270079356928,
				"free": "10.6gb",
				"free_in_bytes": 11472879616,
				"used": "240.8gb",
				"used_in_bytes": 258606477312,
				"free_percent": 4,
				"used_percent": 96
			}
		},
		"process": {
			"cpu": {
				"percent": 67
			},
			"open_file_descriptors": {
				"min": 1758,
				"max": 4505,
				"avg": 3543
			}
		},
		"jvm": {
			"max_uptime": "66.9d",
			"max_uptime_in_millis": 5783387936,
			"versions": [
				{
					"version": "16.0.2",
					"vm_name": "OpenJDK 64-Bit Server VM",
					"vm_version": "16.0.2+7",
					"vm_vendor": "Eclipse Foundation",
					"bundled_jdk": true,
					"using_bundled_jdk": true,
					"count": 3
				}
			],
			"mem": {
				"heap_used": "107.1gb",
				"heap_used_in_bytes": 115055536344,
				"heap_max": "180gb",
				"heap_max_in_bytes": 193273528320
			},
			"threads": 510
		},
		"fs": {
			"total": "15.2tb",
			"total_in_bytes": 16727295205376,
			"free": "9.5tb",
			"free_in_bytes": 10495911034880,
			"available": "8.7tb",
			"available_in_bytes": 9645852045312
		},
		"plugins": [],
		"network_types": {
			"transport_types": {
				"security4": 3
			},
			"http_types": {
				"security4": 3
			}
		},
		"discovery_types": {
			"zen": 3
		},
		"packaging_types": [
			{
				"flavor": "default",
				"type": "deb",
				"count": 3
			}
		],
		"ingest": {
			"number_of_pipelines": 2,
			"processor_stats": {
				"gsub": {
					"count": 0,
					"failed": 0,
					"current": 0,
					"time": "0s",
					"time_in_millis": 0
				},
				"script": {
					"count": 0,
					"failed": 0,
					"current": 0,
					"time": "0s",
					"time_in_millis": 0
				}
			}
		}
	}
}

notgurev · February 22, 2022, 10:16am

iowait% is at 1-3, but sometimes spikes to 10-20 when doing queries for 2 days worth of data

warkolm · February 22, 2022, 10:21am

Is your heap set to 60GB? That's pretty unusual.

notgurev · February 22, 2022, 10:28am

I'm thinking of lowering heap to 32GB since it's recommended. Someone probably set it that high because they had no idea about the recommendations.

BTW, is our model of hourly indices with 1 shard bad? I suspect that this is far from ideal, but we only have 1 hot node, so 3+ shards per index would not work better, or would it?

warkolm · February 22, 2022, 10:31am

It's not ideal, your shard size seems to average 7GB which is going to be wasting resources.

Look at using ILM.

notgurev · February 22, 2022, 1:16pm

I'm thinking of setting up a rollover with a 50GB shard limit. Would that help my slow search problem?

Christian_Dahlqvist · February 22, 2022, 1:36pm

If iowait is high your problem is that storage is too slow compared to the load you expext the cluster to support. I doubt increasing shard size will resolve that.

notgurev · February 22, 2022, 1:37pm

But iowait isn't that high. The server has several SSDs.

Christian_Dahlqvist · February 22, 2022, 2:05pm

10-20 sounds kind of high. Just to have a common reference, can you run iostat -x on the nodes while you are running a slow query so we can see what the actual full output is?

Christian_Dahlqvist · February 22, 2022, 2:07pm

It need to be lower than 32GB in order for compressed pointers to be used. Ensure the use of compressed pointers is logged when you start up Elasticsearch.

notgurev · March 1, 2022, 10:12am

Just found out about Index Sorting. Would that be helpful for optimizing search speed (by timestamp)?

notgurev · March 6, 2022, 10:44am

Added a data stream + rollover on reaching 50gb + sorting by @timestamp. Still the same search speed =(

I realised that I have to clarify that I'm speaking of Kibana Lens Visualization time, instead of query time. Visualization takes more 3 minutes when visualizing data from 1 day. Could that actually be an issue with Kibana?

system · April 3, 2022, 10:44am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Slow indexes search Elasticsearch	28	865	July 14, 2019
Elasticsearch First Response is very slow Elasticsearch	5	706	February 12, 2019
Elasticsearch searches getting slower and slower with time Elasticsearch	3	442	July 8, 2018
Kibana dashboard slowelness Elasticsearch	2	320	February 25, 2019
Elasticsearch 7.4 query_then_fetch slow log Elasticsearch	8	240	August 31, 2023

Slow search in Kibana / ElasticSearch

Related topics