Slow queries during users peak

Hi,
We have project with a lot of users in website. All site information we store in elaticsearch in AWS i3.8xlarge.elasticsearch 8 instances. We have one index and in this index 7 types. Total documents count is 80millions. I think this number of documents is not big for elasticsearch. But when we get users peak, elasticsearch become so slow and we get timeouts from AWS ELB. Elasticsearch intances CPU increase to 95%. maybe you can provide us suggestions or what we can tweak to get better performance. I checked queries count in each website page. We have 1-3 queries per page.

Thanks if you will be able to help me.
More info about our cluster:
3 Master nodes (1 active - the other 2 in case of failing)
11 data nodes
Default configuration:
1 index / 5 shards
We've changed from 1 to 2 replicas

How large is the index? How large are your documents? What type of queries are you using? How many queries per second are you serving? Which version of Elasticsearch are you using?

Thanks for quick answer. We are using Elastic search 5.6 version. I attached file with few query examples which we are using. Screenshot by Lightshot Most of our queries are similar to these.

Index size 63.6gb at this moment.
Document size is not big (about 0,81KB / Document). Document contains 10-20 attributes. These attributes are integers or keywords.
Actually I do not know how many queries per second we serve. We can not enable logging on prod, because this will impact performance.
But like I said before we have 1-3 queries per page load. And AWS load balancer shows 120 000 queries per minute.
Hope this info will be enough to help me.

I use this library GitHub - elastic/elasticsearch-php at 5.0

If I understand this correctly, you have a single index in the cluster with 2 replicas (15 shards in total) across 11 data nodes. This means that each node only have 1 or 2 shards. As your data set is reasonably small, have you tried to increase the replica count further to spread out load better?

Can you also provide the output of the cluster stats API.

How frequently are you updating or indexing data?

Yes. single index and 2 replicas and 11 nodes. Every hour we update about 600 000 different documents. Also we get about 200 000- 300 000 new documents records in a hour. We did not try to increase replica count anymore.

Here is a cluster stats:
{"_nodes":{"total":11,"successful":11,"failed":0},"cluster_name":"116743035446:predictor-production","timestamp":1529837959723,"status":"green","indices":{"count":3,"shards":{"total":27,"primaries":11,"replication":1.4545454545454546,"index":{"shards":{"min":2,"max":15,"avg":9.0},"primaries":{"min":1,"max":5,"avg":3.6666666666666665},"replication":{"min":1.0,"max":2.0,"avg":1.3333333333333333}}},"docs":{"count":82749828,"deleted":37320519},"store":{"size":"195.6gb","size_in_bytes":210099207551,"throttle_time":"0s","throttle_time_in_millis":0},"fielddata":{"memory_size":"0b","memory_size_in_bytes":0,"evictions":0},"query_cache":{"memory_size":"1.2gb","memory_size_in_bytes":1367228960,"total_count":466595045,"hit_count":157210692,"miss_count":309384353,"cache_size":472252,"cache_count":3991874,"evictions":3519622},"completion":{"size":"0b","size_in_bytes":0},"segments":{"count":481,"memory":"502.5mb","memory_in_bytes":526915166,"terms_memory":"380.7mb","terms_memory_in_bytes":399237280,"stored_fields_memory":"65.3mb","stored_fields_memory_in_bytes":68494840,"term_vectors_memory":"0b","term_vectors_memory_in_bytes":0,"norms_memory":"30kb","norms_memory_in_bytes":30784,"points_memory":"50.9mb","points_memory_in_bytes":53396362,"doc_values_memory":"5.4mb","doc_values_memory_in_bytes":5755900,"index_writer_memory":"52.5mb","index_writer_memory_in_bytes":55091103,"version_map_memory":"29.6kb","version_map_memory_in_bytes":30400,"fixed_bit_set":"0b","fixed_bit_set_memory_in_bytes":0,"max_unsafe_auto_id_timestamp":1529783290728,"file_sizes":{}}},"nodes":{"count":{"total":11,"data":8,"coordinating_only":0,"master":3,"ingest":8},"versions":["5.5.2"],"os":{"available_processors":524,"allocated_processors":268,"names":[{"count":11}],"mem":{"total":"3.7tb","total_in_bytes":4149041672192,"free":"3.2tb","free_in_bytes":3577610948608,"used":"532.1gb","used_in_bytes":571430723584,"free_percent":86,"used_percent":14}},"process":{"cpu":{"percent":30},"open_file_descriptors":{"min":916,"max":2009,"avg":1705}},"jvm":{"max_uptime":"15.5h","max_uptime_in_millis":55920913,"mem":{"heap_used":"116.7gb","heap_used_in_bytes":125343354344,"heap_max":"256.1gb","heap_max_in_bytes":274993512448},"threads":4352},"fs":{"total":"109.7tb","total_in_bytes":120662917029888,"free":"109.5tb","free_in_bytes":120435399479296,"available":"109.5tb","available_in_bytes":120435214929920},"network_types":{"transport_types":{"netty4":11},"http_types":{"filter-jetty":11}}}}

Typically you increase query throughput in Elasticsearch by scaling out the number of replica shards. The trade-off is naturally that this requires more effort when indexing and updating data. I would recommend slowly increasing the number of replicas to see what effect it has unless you have a separate cluster to run benchmarks on.

Thanks for suggestion. Also If I will split my index to smaller. Will it help?

It is difficult to know if more, smaller shards will perform better or not, so the best way id probably to test or benchmark. It may make sense to align the number of primary shards with the number of data nodes you have.

We changed instance type and added more nodes. Seems that this helped. Also from AWS guys we get suggestion to reindex data. So we moved types to other index with more shards.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.