How can we configure Kibana to search only on specific nodes?

Is there a way to configure Kibana to Only query a specific node for all search queries?

Underlying problem:

Some of the time what happens is, we run a HUGE "cardinality" query ( data tables viz ) where the number of buckets is too large and causing the node the go OUT OF HEAP SPACE. This affects the insertion of data from the applications as well.

Can we have a set of nodes (say 3 nodes) only for KIBANA searching?

Hence a set of nodes will be used for data insertion queries only while the other set for searching. So even if search node fails due to large queries etc. will not affect the insertions.

Any suggestions on solving the problem?

If you are using time-based indices, you may be able to implement a hot/warm architecture, where a set of nodes are specialised at handling indices that are no longer indexed into. This can take some load and memory pressure off the nodes performing indexing, and may help with your problem. If you are querying the indices being actively indexed into, there is however as far as I can think of no way to separate indexing and querying the way you describe.

There is a topic of "preference of search queries" - https://www.elastic.co/guide/en/elasticsearch/reference/5.2/search-request-preference.html.

It has option _only_node, can this help solve the problem?

How many indices do you have? How many of these are you actively indexing into? How many of these are targeted by the cardinality aggregation?

Except for logstash, monitoring metrics and topbeat, we have around 40 indices being actively indexed into. These are monthly indices.

Basically, cardinality can be hit on any index based on requirement at that point in time. Adhoc queries of reporting team and monitoring by DevOps team are the source of these requests. Both of these queries requires current data as well as historical data.

Using preference you can try to control the nodes serving the query, but it will still affect nodes performing indexing (any node that holds a primary or replica shard of an index being indexed into) and I am not sure you can alter this through Kibana. I can not think of any other approach either, but maybe someone else has some ideas?

You may be able to configure stricter circuit-breakers to avoid having expensive queries taking down nodes, but this will cause queries to fail.

ExAcTlY! We have a strict Request + Total circuit breaker. We also stumble upon CB exception even sometimes during peak insertion hours. Same for search queries as well. This is also a pain as we tend to lose data in insertion when we trip the circuit breaker.

You may need to scale up/out your cluster or make sure that heap memory is used as efficiently as possible. What is the full output of the cluster stats API?

GIST Cluster stats.

You seem to have a lot of small shards in the cluster, which can be very inefficient and increase heap usage. I would recommend that you reduce the number of shards quite significantly in order to make the cluster run better.

Point taken :slight_smile: . But the underlying problem of OOM on cardinality and queries by untrusted users causing damage will still be the issue. right?

If you use heap more efficiently, there should be more to use by the users, potentially leading to fewer OOMs. It will however still be possible for users to run expensive queries, which will cause problems unless the circuit breakers prevent it. I believe may also have been improvements to how circuit breakers work in more recent releases, so it may also be worthwhile upgrading.

2 Likes

I'll work on this. Can you provide the initial reference for this .. I'd love to do more testing and research.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.