Improving Speed to Query Millions of Small Documents

seth.yes · September 16, 2019, 9:30pm

ES, Kibana Version 7.3.1

I'm hitting the two-minute timeout on Elasticsearch when trying to query just the past 12 hours of data. I'm curious on what I can do to increase query speed. I have basically unlimited hardware available for additional data, master, or client nodes. However, with the current setup none of these nodes are getting hit very hard, yet the cluster still times out.

We have two clusters, each producing a daily index.
Daily Index Stats:
171M docs
500B/doc
72GB
2 Primaries, 2 replicas: 6 indices in total
6 data nodes, 3 masters per cluster

2 clients in one cluster - querying both the local and remote clusters.

Data/Client Node config:
32 cores
31 GB RAM (verified to still have zero-based compress OOPs) on docker
256GB ram on physical server

Master Node config:
32 cores
8 GB RAM

Data/Client/Master Node modified settings:

bootstrap.memory_lock: true
network.host: 0.0.0.0
http.host: localhost
http.max_header_size: 32kB
gateway.recover_after_master_nodes: 2
action.destructive_requires_name: true

indices.query.bool.max_clause_count: 8192
search.max_buckets: 100000

thread_pool.write.queue_size: 2500
thread_pool.search.queue_size: 4000
thread_pool.search.min_queue_size: 4000
thread_pool.search.max_queue_size: 10000
thread_pool.search.target_response_time: 15s

reindex.remote.whitelist: ["*.*.*.*:*"]
script.painless.regex.enabled: false

xpack.ml.enabled: false
xpack.monitoring.collection.enabled: true
xpack.monitoring.elasticsearch.collection.enabled: true
xpack.watcher.enabled: false

Data nodes are additionally set to: ingest:true in order to enable monitoring.

Happy to share additional specific configs.

Do I need to add additional client nodes? I can't imagine I'd need to add additional data nodes..? I'm okay to have slow queries - it's somewhat expected, but it feels so slow such that something must be incorrectly set.

Christian_Dahlqvist · September 17, 2019, 5:26am

What kind of queries are you running? I can see that you have quite a few non-standard settings in your configuration and wonder if that could be related?

How many indices and shards do your queries target? How many concurrent queries do you need to support? What level of concurrent indexing is taking place? Is that 72GB the size of the primary shards for each of the daily indices or all of them?

What is your retention period?

seth.yes · September 17, 2019, 9:19pm

What kind of queries are you running?

I am analyzing netflow data using elastiflow dashboards in Kibana. Example query.

I can see that you have quite a few non-standard settings in your configuration and wonder if that could be related?

Unfortunately, we were encountering slow query speeds prior to modifying the thread_pool configs. I'm happy to change them back, but we're hitting a timeout at 120s currently that makes benchmarking extremely difficult.

How many indices and shards do your queries target?

The aforementioned query will target about 14 daily indices on two clusters, at about 281 shards being queried. A typical dashboard will run about 3-5 concurrent queries. We don't expect more than 1 or 2 users to be on the system at a time.

What level of concurrent indexing is taking place?

mostly just a single index is being indexed, at about 4k eps.

Is that 72GB the size of the primary shards for each of the daily indices or all of them?

72GB is the size of just the primary shards. Considering increasing to 3 shards / 1 replica due to the size. Thoughts? Do I need to take into consideration that this is over ~468M docs?

What is your retention period?

Retention period is 90 days, however we're looking at ways to rollup the data after 1 month, simply due to query time

Christian_Dahlqvist · September 18, 2019, 5:11am

What kind of storage are you using? Local SSDs? Have you monitored disk utilisation and iowait when you are querying?

seth.yes · September 18, 2019, 12:27pm

We are using SSDs in a JBOD configuration. I can do some testing to check iowait and disk utilization today.
The more immediate issue we're facing is that we can't run queries for longer than 2 minutes. The team that owns this cluster is pointing to a Kibana timeout (issue crossposted to Kibana forums) they say they can't modify until 7.4. Does that sound correct? I believe I've ran >2 minute long queries on previous Elastic stacks without issue..

seth.yes · September 23, 2019, 9:20pm

@Christian_Dahlqvist It's interesting - I'm not seeing any reads on datanodes or the client node when I run a large query. However writes (and indexing) are occurring just fine:

$ sudo iotop -n 1 -b -o | awk '{print $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11 $12}'
Total DISK READ : 0.00 B/s | Total DISK WRITE :3.57
Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 2.47 M/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND
4746 be/3 root 0.00 B/s 0.00 B/s 0.00 % 2.21 %[jbd2/sdb3-8]
127195 be/4 udocker 0.00 B/s 310.77 K/s 0.00 % 0.44 %java
127196 be/4 udocker 0.00 B/s 1204.23 K/s 0.00 % 0.33 %java
126888 be/4 udocker 0.00 B/s 38.85 K/s 0.00 % 0.22 %java
126891 be/4 udocker 0.00 B/s 38.85 K/s 0.00 % 0.22 %java
126889 be/4 udocker 0.00 B/s 38.85 K/s 0.00 % 0.22 %java
127172 be/4 udocker 0.00 B/s 77.69 K/s 0.00 % 0.14 %java
127199 be/4 udocker 0.00 B/s 38.85 K/s 0.00 % 0.12 %java
125649 be/4 udocker 0.00 B/s 1903.46 K/s 0.00 % 0.00 %nginx:

Is the cluster state in newer versions of ES still a big deal? For one cluster, the cluster state is 20MB, for the other cluster, it's probably double that (still curling to a file to determine overall size). Furthermore, when I try to run GET _cluster/state, it breaks the dev console every time..

system · October 21, 2019, 9:20pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Making Kibana (4) searches faster Elasticsearch	2	1170	July 5, 2017
High load on nodes every time a certain query executes Elasticsearch	3	651	July 5, 2017
Query duration too high for ~200G logs a day Elasticsearch	7	1019	September 25, 2017
How to handle long running queries Elasticsearch	1	1586	July 17, 2018
How to increase query performance? Elasticsearch	3	373	July 6, 2017

Improving Speed to Query Millions of Small Documents

Related topics