Large Data Set with Low Memory = Frozen Nodes

El_Jeffo · September 3, 2014, 8:32am

So I've collected about 100GB of logstash logs over 3 months.

So there are roughly 100 Indexes such as logstash-2014.07.01 and so forth.

I have a cluster of 3 EC2 Instances, 1 CPU w/ 4GB RAM Each. Granted it's
not much.

When I do queries, it's usually fast, until I run a large timespan that
span say 10-30 indexes. At that point, I'm guessing each node has loaded
so much index and field data that it was nearly impossible to avoid
overrunning the heap or RAM on the cluster. I end up with nodes at 100%
CPU, and 75% RAM usage.

I just wanted to check what was possible with tuning:

Given limited RAM, is it possible somehow tune my nodes such that in
event of a large query requiring too much RAM:
1a) The job gets killed due to timeout
1b) Something else saves my node from becoming non-responsive?
Is it possible to make some indexes work fast, while others slow?
2a) When I query historical data, I don't need an answer quickly. Just
eventually.
2b) When I query the last 72 hours, I really want an answer quickly, even
if that means killing other jobs
Is it an unavoidable fact that as my data increases, I have no choice
but to either:
3a) Increase cluster RAM to hold every index/field at the same time?
3b) Delete indexes until everything fits in RAM?

As I attempt to open opensearch to more people, they are running queries in
Kibana that span a larger and larger timeframe. Thus leading to random
frozen nodes.

If there was just some way to prevent frozen nodes (Maxed out CPU @ 100%
despite ram usage at say 3gb out of 4gb) then I would have a more stable
cluster.

As EC2 does carry a noticable cost, I was trying to minimize my EC2
requirement. So I'm trying to find ways to selectively reduce performance
where I don't need it.

Any ideas?

Jeff

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0883a19c-48e5-46d0-9c69-3dc9f411c2e1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

El_Jeffo · September 3, 2014, 9:02am

Some notes from warkolm via elasticsearch

searchme close index
[17:52] warkolm: Try these urls:
[17:52] ..
Elasticsearch Platform — Find real-time answers at scale | Elastic
[17:52] ..
Elasticsearch Platform — Find real-time answers at scale | Elastic
[17:53] there you go
[17:57] but check out the cat apis
[17:57] if you're new, install monitoring plugins like elastichq
and kopf, they will give you visual insight into things

Thanks warkolm!

On Wednesday, September 3, 2014 5:32:11 PM UTC+9, El Jeffo wrote:

So I've collected about 100GB of logstash logs over 3 months.

So there are roughly 100 Indexes such as logstash-2014.07.01 and so forth.

I have a cluster of 3 EC2 Instances, 1 CPU w/ 4GB RAM Each. Granted it's
not much.

When I do queries, it's usually fast, until I run a large timespan that
span say 10-30 indexes. At that point, I'm guessing each node has loaded
so much index and field data that it was nearly impossible to avoid
overrunning the heap or RAM on the cluster. I end up with nodes at 100%
CPU, and 75% RAM usage.

I just wanted to check what was possible with tuning:

Given limited RAM, is it possible somehow tune my nodes such that in
event of a large query requiring too much RAM:
1a) The job gets killed due to timeout
1b) Something else saves my node from becoming non-responsive?

Is it possible to make some indexes work fast, while others slow?
2a) When I query historical data, I don't need an answer quickly. Just
eventually.
2b) When I query the last 72 hours, I really want an answer quickly,
even if that means killing other jobs

Is it an unavoidable fact that as my data increases, I have no choice
but to either:
3a) Increase cluster RAM to hold every index/field at the same time?
3b) Delete indexes until everything fits in RAM?

As I attempt to open opensearch to more people, they are running queries
in Kibana that span a larger and larger timeframe. Thus leading to random
frozen nodes.

If there was just some way to prevent frozen nodes (Maxed out CPU @ 100%
despite ram usage at say 3gb out of 4gb) then I would have a more stable
cluster.

As EC2 does carry a noticable cost, I was trying to minimize my EC2
requirement. So I'm trying to find ways to selectively reduce performance
where I don't need it.

Any ideas?

Jeff

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/50dc4481-f060-47e3-9bea-87e0d5ce3048%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Cluster freeze Elasticsearch	11	1073	October 31, 2018
Tune memory usage? Elasticsearch	3	785	July 6, 2017
ElasticSearch memory usage on centralized log clusters Elasticsearch	4	918	July 6, 2017
Our elastic search query performance is VERY low Elasticsearch	12	1587	May 11, 2017
Long-term, low query logs storage in Elasticsearch 5.6 Cluster - what are the risks of using more than 50% of RAM to heap? Elasticsearch	5	674	March 12, 2020

Large Data Set with Low Memory = Frozen Nodes

Related topics