Powerful cluster is not able to handle 1.5Tb of data, how to optimize?

Pavel_P · September 12, 2014, 8:48am

Hi,

Again I have an issue with the power of the cluster.

I have the cluster from 3 servers, each has 30RAM, 8 CPUs and 1Tb disk
attached.

https://lh4.googleusercontent.com/-W1AVatn9Cq0/VBKzYgR3QKI/AAAAAAAAAJc/S3TWMBqqqX0/s1600/ES_cluster.png

There are 1323957069 docs (1.64TB) there, the documents distribution is the
next:

https://lh5.googleusercontent.com/-kjlQG7xBfIw/VBKwCt8sKQI/AAAAAAAAAJQ/s8kuqouFUkQ/s1600/Screen%2BShot%2B2014-09-12%2Bat%2B11.33.49%2BAM.png

All the 3 nodes are data nodes.

The index throughput is something about 10-20k documents per minute. (it's
the logstash -> elasticsearch setup, we store different logs in the cluster)

My concerns are the next:

When I load the index page of kibana - the loading of the document types
panel takes about a minute. It that ok?
For the document type user_account, when I try to build the terms panel
for the field "message.raw" (the string of 20-30 characters). My cluster
stucks.
In the logs I can find the next

[2014-09-11 08:03:34,507][ERROR][indices.fielddata.breaker] [morbius] New

used memory 6499531395 [6gb] from field [message.raw] would be larger than
configured breaker: 6414558822 [5.9gb], breaking

But, despite of the breakers, when it tries to calculate that terms pie, it
stops indexing the input documents. The queue goes up. Then, it happens
that I see the heap exceptions and to solve them the only thing I could do
is to reboot the cluster.

My question is the next:

It looks like I have quite powerful servers and the correct configuration
(my ES_HEAP_SIZE is set to 15g), while they are still not able to process
the 1.5Tb of information or doing that quite slowly.
Do you have any advice of how to overcome that and make my cluster to
response more fast? How should I adjust the infrastructure?

Which hardware should I need to manipulate the 1.5Tb in the reasonable
amount of time?

Any thoughts are welcome.

Regards,

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/707ed8a1-8f94-48cc-a78a-0e1f63f32b8d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · September 12, 2014, 8:52am

That's a lot of data for 3 nodes!
You really need to adjust your infrastructure; add more nodes, more ram, or
alternatively remove some old indexes (delete or close).

What ES and java version are you running?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 12 September 2014 18:48, Pavel P pavel@kredito.de wrote:

Hi,

Again I have an issue with the power of the cluster.

I have the cluster from 3 servers, each has 30RAM, 8 CPUs and 1Tb disk
attached.

https://lh4.googleusercontent.com/-W1AVatn9Cq0/VBKzYgR3QKI/AAAAAAAAAJc/S3TWMBqqqX0/s1600/ES_cluster.png

There are 1323957069 docs (1.64TB) there, the documents distribution is
the next:

https://lh5.googleusercontent.com/-kjlQG7xBfIw/VBKwCt8sKQI/AAAAAAAAAJQ/s8kuqouFUkQ/s1600/Screen%2BShot%2B2014-09-12%2Bat%2B11.33.49%2BAM.png

All the 3 nodes are data nodes.

The index throughput is something about 10-20k documents per minute. (it's
the logstash -> elasticsearch setup, we store different logs in the cluster)

My concerns are the next:

When I load the index page of kibana - the loading of the document
types panel takes about a minute. It that ok?

For the document type user_account, when I try to build the terms panel
for the field "message.raw" (the string of 20-30 characters). My cluster
stucks.
In the logs I can find the next

[2014-09-11 08:03:34,507][ERROR][indices.fielddata.breaker] [morbius] New

used memory 6499531395 [6gb] from field [message.raw] would be larger than
configured breaker: 6414558822 [5.9gb], breaking

But, despite of the breakers, when it tries to calculate that terms pie,
it stops indexing the input documents. The queue goes up. Then, it happens
that I see the heap exceptions and to solve them the only thing I could do
is to reboot the cluster.

My question is the next:

It looks like I have quite powerful servers and the correct configuration
(my ES_HEAP_SIZE is set to 15g), while they are still not able to process
the 1.5Tb of information or doing that quite slowly.
Do you have any advice of how to overcome that and make my cluster to
response more fast? How should I adjust the infrastructure?

Which hardware should I need to manipulate the 1.5Tb in the reasonable
amount of time?

Any thoughts are welcome.

Regards,

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/707ed8a1-8f94-48cc-a78a-0e1f63f32b8d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/707ed8a1-8f94-48cc-a78a-0e1f63f32b8d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bMpf0T2Spn0gBOowsLK-7rxSKf%2BE30FaFqV397LQSaKw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Pavel_P · September 12, 2014, 9:01am

Java version is "1.7.0_55"
Elasticsearch is 1.3.1

Well, the cost of the whole setup is the question.
currently it's something about 1000$ per month on AWS. Do we really need to
pay a lot more then 1000$/month to support the 1.5Tb data?

Could you briefly describe how much nodes do you expect to handle that much
of data?

The side question is, how the the really Big Data solution works, when they
do the search or aggregation from the data which size is far more then
1.5Tb? Or it's as well is the size of the architecture.

Regards,

On Friday, September 12, 2014 11:53:35 AM UTC+3, Mark Walkom wrote:

That's a lot of data for 3 nodes!
You really need to adjust your infrastructure; add more nodes, more ram,
or alternatively remove some old indexes (delete or close).

What ES and java version are you running?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 12 September 2014 18:48, Pavel P <pa...@kredito.de <javascript:>>
wrote:

Hi,

Again I have an issue with the power of the cluster.

I have the cluster from 3 servers, each has 30RAM, 8 CPUs and 1Tb disk
attached.

https://lh4.googleusercontent.com/-W1AVatn9Cq0/VBKzYgR3QKI/AAAAAAAAAJc/S3TWMBqqqX0/s1600/ES_cluster.png

There are 1323957069 docs (1.64TB) there, the documents distribution is
the next:

https://lh5.googleusercontent.com/-kjlQG7xBfIw/VBKwCt8sKQI/AAAAAAAAAJQ/s8kuqouFUkQ/s1600/Screen%2BShot%2B2014-09-12%2Bat%2B11.33.49%2BAM.png

All the 3 nodes are data nodes.

The index throughput is something about 10-20k documents per minute.
(it's the logstash -> elasticsearch setup, we store different logs in the
cluster)

My concerns are the next:

When I load the index page of kibana - the loading of the document
types panel takes about a minute. It that ok?

For the document type user_account, when I try to build the terms
panel for the field "message.raw" (the string of 20-30 characters). My
cluster stucks.
In the logs I can find the next

[2014-09-11 08:03:34,507][ERROR][indices.fielddata.breaker] [morbius] New

used memory 6499531395 [6gb] from field [message.raw] would be larger than
configured breaker: 6414558822 [5.9gb], breaking

But, despite of the breakers, when it tries to calculate that terms pie,
it stops indexing the input documents. The queue goes up. Then, it happens
that I see the heap exceptions and to solve them the only thing I could do
is to reboot the cluster.

My question is the next:

It looks like I have quite powerful servers and the correct configuration
(my ES_HEAP_SIZE is set to 15g), while they are still not able to
process the 1.5Tb of information or doing that quite slowly.
Do you have any advice of how to overcome that and make my cluster to
response more fast? How should I adjust the infrastructure?

Which hardware should I need to manipulate the 1.5Tb in the reasonable
amount of time?

Any thoughts are welcome.

Regards,

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/707ed8a1-8f94-48cc-a78a-0e1f63f32b8d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/707ed8a1-8f94-48cc-a78a-0e1f63f32b8d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3aa93dc1-1c75-4b75-b864-8b391ec218c6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · September 12, 2014, 9:10am

The answer is it depends on what sort of use case you have.
But if you are experiencing problems like you are then usually it's due to
the cluster being at capacity and needing more resources.

You may find it cheaper to move to more numerous and smaller nodes that you
can distribute the load across, as that is where ES excels and also how
many other big data platforms operate.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 12 September 2014 19:01, Pavel P pavel@kredito.de wrote:

Java version is "1.7.0_55"
Elasticsearch is 1.3.1

Well, the cost of the whole setup is the question.
currently it's something about 1000$ per month on AWS. Do we really need
to pay a lot more then 1000$/month to support the 1.5Tb data?

Could you briefly describe how much nodes do you expect to handle that
much of data?

The side question is, how the the really Big Data solution works, when
they do the search or aggregation from the data which size is far more then
1.5Tb? Or it's as well is the size of the architecture.

Regards,

On Friday, September 12, 2014 11:53:35 AM UTC+3, Mark Walkom wrote:

That's a lot of data for 3 nodes!
You really need to adjust your infrastructure; add more nodes, more ram,
or alternatively remove some old indexes (delete or close).

What ES and java version are you running?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 12 September 2014 18:48, Pavel P pa...@kredito.de wrote:

Hi,

Again I have an issue with the power of the cluster.

I have the cluster from 3 servers, each has 30RAM, 8 CPUs and 1Tb disk
attached.

https://lh4.googleusercontent.com/-W1AVatn9Cq0/VBKzYgR3QKI/AAAAAAAAAJc/S3TWMBqqqX0/s1600/ES_cluster.png

There are 1323957069 docs (1.64TB) there, the documents distribution is
the next:

https://lh5.googleusercontent.com/-kjlQG7xBfIw/VBKwCt8sKQI/AAAAAAAAAJQ/s8kuqouFUkQ/s1600/Screen%2BShot%2B2014-09-12%2Bat%2B11.33.49%2BAM.png

All the 3 nodes are data nodes.

The index throughput is something about 10-20k documents per minute.
(it's the logstash -> elasticsearch setup, we store different logs in the
cluster)

My concerns are the next:

When I load the index page of kibana - the loading of the document
types panel takes about a minute. It that ok?

For the document type user_account, when I try to build the terms
panel for the field "message.raw" (the string of 20-30 characters). My
cluster stucks.
In the logs I can find the next

[2014-09-11 08:03:34,507][ERROR][indices.fielddata.breaker] [morbius]

New used memory 6499531395 [6gb] from field [message.raw] would be larger
than configured breaker: 6414558822 [5.9gb], breaking

But, despite of the breakers, when it tries to calculate that terms pie,
it stops indexing the input documents. The queue goes up. Then, it happens
that I see the heap exceptions and to solve them the only thing I could do
is to reboot the cluster.

My question is the next:

It looks like I have quite powerful servers and the correct
configuration (my ES_HEAP_SIZE is set to 15g), while they are still not
able to process the 1.5Tb of information or doing that quite slowly.
Do you have any advice of how to overcome that and make my cluster to
response more fast? How should I adjust the infrastructure?

Which hardware should I need to manipulate the 1.5Tb in the reasonable
amount of time?

Any thoughts are welcome.

Regards,

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/707ed8a1-8f94-48cc-a78a-0e1f63f32b8d%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/707ed8a1-8f94-48cc-a78a-0e1f63f32b8d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3aa93dc1-1c75-4b75-b864-8b391ec218c6%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3aa93dc1-1c75-4b75-b864-8b391ec218c6%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624b2svEQR-SQvgH9hNYkRi8AoPVswLGFv%2BLaPU3Wrw301w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Pavel_P · September 12, 2014, 9:24am

Do you say, that 10 servers like 2 CPU, 7.5 RAM (so totally 20 CPUs and
75Gb RAM) cluster would be more powerful then the 3 serves of 8 CPU and 30
RAM (in total 24 CPU and 90RAM) ?
Assuming that the information would be spread there equally.

btw, what about the shards allocation. Currently I use the default one 5
shards and 1 replica. Could this be a potential thing to optimisation?
How the shards scheme should look on the cluster with the bigger number of
the nodes?

Regards,

On Friday, September 12, 2014 12:11:32 PM UTC+3, Mark Walkom wrote:

The answer is it depends on what sort of use case you have.
But if you are experiencing problems like you are then usually it's due to
the cluster being at capacity and needing more resources.

You may find it cheaper to move to more numerous and smaller nodes that
you can distribute the load across, as that is where ES excels and also how
many other big data platforms operate.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 12 September 2014 19:01, Pavel P <pa...@kredito.de <javascript:>>
wrote:

Java version is "1.7.0_55"
Elasticsearch is 1.3.1

Well, the cost of the whole setup is the question.
currently it's something about 1000$ per month on AWS. Do we really need
to pay a lot more then 1000$/month to support the 1.5Tb data?

Could you briefly describe how much nodes do you expect to handle that
much of data?

The side question is, how the the really Big Data solution works, when
they do the search or aggregation from the data which size is far more then
1.5Tb? Or it's as well is the size of the architecture.

Regards,

On Friday, September 12, 2014 11:53:35 AM UTC+3, Mark Walkom wrote:

That's a lot of data for 3 nodes!
You really need to adjust your infrastructure; add more nodes, more ram,
or alternatively remove some old indexes (delete or close).

What ES and java version are you running?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 12 September 2014 18:48, Pavel P pa...@kredito.de wrote:

Hi,

Again I have an issue with the power of the cluster.

I have the cluster from 3 servers, each has 30RAM, 8 CPUs and 1Tb disk
attached.

https://lh4.googleusercontent.com/-W1AVatn9Cq0/VBKzYgR3QKI/AAAAAAAAAJc/S3TWMBqqqX0/s1600/ES_cluster.png

There are 1323957069 docs (1.64TB) there, the documents distribution
is the next:

https://lh5.googleusercontent.com/-kjlQG7xBfIw/VBKwCt8sKQI/AAAAAAAAAJQ/s8kuqouFUkQ/s1600/Screen%2BShot%2B2014-09-12%2Bat%2B11.33.49%2BAM.png

All the 3 nodes are data nodes.

The index throughput is something about 10-20k documents per minute.
(it's the logstash -> elasticsearch setup, we store different logs in the
cluster)

My concerns are the next:

When I load the index page of kibana - the loading of the document
types panel takes about a minute. It that ok?

For the document type user_account, when I try to build the terms
panel for the field "message.raw" (the string of 20-30 characters). My
cluster stucks.
In the logs I can find the next

[2014-09-11 08:03:34,507][ERROR][indices.fielddata.breaker] [morbius]

New used memory 6499531395 [6gb] from field [message.raw] would be larger
than configured breaker: 6414558822 [5.9gb], breaking

But, despite of the breakers, when it tries to calculate that terms
pie, it stops indexing the input documents. The queue goes up. Then, it
happens that I see the heap exceptions and to solve them the only thing I
could do is to reboot the cluster.

My question is the next:

It looks like I have quite powerful servers and the correct
configuration (my ES_HEAP_SIZE is set to 15g), while they are still
not able to process the 1.5Tb of information or doing that quite slowly.
Do you have any advice of how to overcome that and make my cluster to
response more fast? How should I adjust the infrastructure?

Which hardware should I need to manipulate the 1.5Tb in the reasonable
amount of time?

Any thoughts are welcome.

Regards,

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/707ed8a1-8f94-48cc-a78a-0e1f63f32b8d%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/707ed8a1-8f94-48cc-a78a-0e1f63f32b8d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3aa93dc1-1c75-4b75-b864-8b391ec218c6%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3aa93dc1-1c75-4b75-b864-8b391ec218c6%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5464fa68-1d50-47f7-889d-08952501517f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · September 12, 2014, 10:26am

As I initially mentioned, it all depends on your use case but generally ES
does scale better horizontally rather than vertically. If you can, spin up
another cluster along side the one you have and then replica the data set
and query usage and compare the performance.

Ideally you should aim for one primary shard per node but you can over
allocate if you expect to grow - ie create 6 shards if you expect to grow
to 6 servers. This applies on larger clusters as well, to a point.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 12 September 2014 19:24, Pavel P pavel@kredito.de wrote:

Do you say, that 10 servers like 2 CPU, 7.5 RAM (so totally 20 CPUs and
75Gb RAM) cluster would be more powerful then the 3 serves of 8 CPU and 30
RAM (in total 24 CPU and 90RAM) ?
Assuming that the information would be spread there equally.

btw, what about the shards allocation. Currently I use the default one 5
shards and 1 replica. Could this be a potential thing to optimisation?
How the shards scheme should look on the cluster with the bigger number of
the nodes?

Regards,

On Friday, September 12, 2014 12:11:32 PM UTC+3, Mark Walkom wrote:

The answer is it depends on what sort of use case you have.
But if you are experiencing problems like you are then usually it's due
to the cluster being at capacity and needing more resources.

You may find it cheaper to move to more numerous and smaller nodes that
you can distribute the load across, as that is where ES excels and also how
many other big data platforms operate.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 12 September 2014 19:01, Pavel P pa...@kredito.de wrote:

Java version is "1.7.0_55"
Elasticsearch is 1.3.1

Well, the cost of the whole setup is the question.
currently it's something about 1000$ per month on AWS. Do we really need
to pay a lot more then 1000$/month to support the 1.5Tb data?

Could you briefly describe how much nodes do you expect to handle that
much of data?

The side question is, how the the really Big Data solution works, when
they do the search or aggregation from the data which size is far more then
1.5Tb? Or it's as well is the size of the architecture.

Regards,

On Friday, September 12, 2014 11:53:35 AM UTC+3, Mark Walkom wrote:

That's a lot of data for 3 nodes!
You really need to adjust your infrastructure; add more nodes, more
ram, or alternatively remove some old indexes (delete or close).

What ES and java version are you running?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 12 September 2014 18:48, Pavel P pa...@kredito.de wrote:

Hi,

Again I have an issue with the power of the cluster.

I have the cluster from 3 servers, each has 30RAM, 8 CPUs and 1Tb disk
attached.

https://lh4.googleusercontent.com/-W1AVatn9Cq0/VBKzYgR3QKI/AAAAAAAAAJc/S3TWMBqqqX0/s1600/ES_cluster.png

There are 1323957069 docs (1.64TB) there, the documents distribution
is the next:

https://lh5.googleusercontent.com/-kjlQG7xBfIw/VBKwCt8sKQI/AAAAAAAAAJQ/s8kuqouFUkQ/s1600/Screen%2BShot%2B2014-09-12%2Bat%2B11.33.49%2BAM.png

All the 3 nodes are data nodes.

The index throughput is something about 10-20k documents per minute.
(it's the logstash -> elasticsearch setup, we store different logs in the
cluster)

My concerns are the next:

When I load the index page of kibana - the loading of the document
types panel takes about a minute. It that ok?

For the document type user_account, when I try to build the terms
panel for the field "message.raw" (the string of 20-30 characters). My
cluster stucks.
In the logs I can find the next

[2014-09-11 08:03:34,507][ERROR][indices.fielddata.breaker] [morbius]

New used memory 6499531395 [6gb] from field [message.raw] would be larger
than configured breaker: 6414558822 [5.9gb], breaking

But, despite of the breakers, when it tries to calculate that terms
pie, it stops indexing the input documents. The queue goes up. Then, it
happens that I see the heap exceptions and to solve them the only thing I
could do is to reboot the cluster.

My question is the next:

It looks like I have quite powerful servers and the correct
configuration (my ES_HEAP_SIZE is set to 15g), while they are still
not able to process the 1.5Tb of information or doing that quite slowly.
Do you have any advice of how to overcome that and make my cluster to
response more fast? How should I adjust the infrastructure?

Which hardware should I need to manipulate the 1.5Tb in the reasonable
amount of time?

Any thoughts are welcome.

Regards,

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/707ed8a1-8f94-48cc-a78a-0e1f63f32b8d%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/707ed8a1-8f94-48cc-a78a-0e1f63f32b8d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/3aa93dc1-1c75-4b75-b864-8b391ec218c6%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3aa93dc1-1c75-4b75-b864-8b391ec218c6%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5464fa68-1d50-47f7-889d-08952501517f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5464fa68-1d50-47f7-889d-08952501517f%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YzauK%2B-Fe76KXBHuOUbgmESFAxbST8FnuBfyuJcYYFUw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

jprante · September 12, 2014, 11:43am

Regarding the shards, if you have 3 nodes and 1 index, with 5 shards you
have a sort of "impedance mismatch" because 5 (or 10 with replica) shards
do not distribute equally on 3 nodes.

Rule: use a shard count that is always a factor of the node, e.g. 3, 6, 9,
12 .... for 3 nodes.

Can you tell what the maximum capacity of a single node is for your
installation? Somehow you must have concluded that 3 nodes are sufficient -
how did you do that? It does not only depend on observing index size. You
can even run 1.5TB index on a single node, if your requirements allow that
according to the data patterns and the search load, but not with the ES
OOTB settings, which is for development installations.

Also note that Kibana is great but I have the impression (I do not use it)
that many queries from the UI are not optimized regarding filter caches and
tend to waste resources. There is much space left for improvement.

Jörg

On Fri, Sep 12, 2014 at 12:26 PM, Mark Walkom markw@campaignmonitor.com
wrote:

As I initially mentioned, it all depends on your use case but generally ES
does scale better horizontally rather than vertically. If you can, spin up
another cluster along side the one you have and then replica the data set
and query usage and compare the performance.

Ideally you should aim for one primary shard per node but you can over
allocate if you expect to grow - ie create 6 shards if you expect to grow
to 6 servers. This applies on larger clusters as well, to a point.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 12 September 2014 19:24, Pavel P pavel@kredito.de wrote:

Do you say, that 10 servers like 2 CPU, 7.5 RAM (so totally 20 CPUs and
75Gb RAM) cluster would be more powerful then the 3 serves of 8 CPU and 30
RAM (in total 24 CPU and 90RAM) ?
Assuming that the information would be spread there equally.

btw, what about the shards allocation. Currently I use the default one 5
shards and 1 replica. Could this be a potential thing to optimisation?
How the shards scheme should look on the cluster with the bigger number
of the nodes?

Regards,

On Friday, September 12, 2014 12:11:32 PM UTC+3, Mark Walkom wrote:

The answer is it depends on what sort of use case you have.
But if you are experiencing problems like you are then usually it's due
to the cluster being at capacity and needing more resources.

You may find it cheaper to move to more numerous and smaller nodes that
you can distribute the load across, as that is where ES excels and also how
many other big data platforms operate.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 12 September 2014 19:01, Pavel P pa...@kredito.de wrote:

Java version is "1.7.0_55"
Elasticsearch is 1.3.1

Well, the cost of the whole setup is the question.
currently it's something about 1000$ per month on AWS. Do we really
need to pay a lot more then 1000$/month to support the 1.5Tb data?

Could you briefly describe how much nodes do you expect to handle that
much of data?

The side question is, how the the really Big Data solution works, when
they do the search or aggregation from the data which size is far more then
1.5Tb? Or it's as well is the size of the architecture.

Regards,

On Friday, September 12, 2014 11:53:35 AM UTC+3, Mark Walkom wrote:

That's a lot of data for 3 nodes!
You really need to adjust your infrastructure; add more nodes, more
ram, or alternatively remove some old indexes (delete or close).

What ES and java version are you running?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 12 September 2014 18:48, Pavel P pa...@kredito.de wrote:

Hi,

Again I have an issue with the power of the cluster.

I have the cluster from 3 servers, each has 30RAM, 8 CPUs and 1Tb
disk attached.

https://lh4.googleusercontent.com/-W1AVatn9Cq0/VBKzYgR3QKI/AAAAAAAAAJc/S3TWMBqqqX0/s1600/ES_cluster.png

There are 1323957069 docs (1.64TB) there, the documents distribution
is the next:

https://lh5.googleusercontent.com/-kjlQG7xBfIw/VBKwCt8sKQI/AAAAAAAAAJQ/s8kuqouFUkQ/s1600/Screen%2BShot%2B2014-09-12%2Bat%2B11.33.49%2BAM.png

All the 3 nodes are data nodes.

The index throughput is something about 10-20k documents per minute.
(it's the logstash -> elasticsearch setup, we store different logs in the
cluster)

My concerns are the next:

When I load the index page of kibana - the loading of the document
types panel takes about a minute. It that ok?

For the document type user_account, when I try to build the terms
panel for the field "message.raw" (the string of 20-30 characters). My
cluster stucks.
In the logs I can find the next

[2014-09-11 08:03:34,507][ERROR][indices.fielddata.breaker]

[morbius] New used memory 6499531395 [6gb] from field [message.raw] would
be larger than configured breaker: 6414558822 [5.9gb], breaking

But, despite of the breakers, when it tries to calculate that terms
pie, it stops indexing the input documents. The queue goes up. Then, it
happens that I see the heap exceptions and to solve them the only thing I
could do is to reboot the cluster.

My question is the next:

It looks like I have quite powerful servers and the correct
configuration (my ES_HEAP_SIZE is set to 15g), while they are still
not able to process the 1.5Tb of information or doing that quite slowly.
Do you have any advice of how to overcome that and make my cluster to
response more fast? How should I adjust the infrastructure?

Which hardware should I need to manipulate the 1.5Tb in the
reasonable amount of time?

Any thoughts are welcome.

Regards,

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/707ed8a1-8f94-48cc-a78a-0e1f63f32b8d%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/707ed8a1-8f94-48cc-a78a-0e1f63f32b8d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/3aa93dc1-1c75-4b75-b864-8b391ec218c6%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3aa93dc1-1c75-4b75-b864-8b391ec218c6%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5464fa68-1d50-47f7-889d-08952501517f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5464fa68-1d50-47f7-889d-08952501517f%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624YzauK%2B-Fe76KXBHuOUbgmESFAxbST8FnuBfyuJcYYFUw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEM624YzauK%2B-Fe76KXBHuOUbgmESFAxbST8FnuBfyuJcYYFUw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGpE5TPD4%2BbW1cD0iX5H5C2HbZYAmu_8oTAHG9cni78Pg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Pavel_P · September 12, 2014, 11:51am

2Jörg

How I decided that 3 is enough.
I've started from 2 nodes in the cluster. And it was not able to manage the
index load.
Then, in this
conversation, Redirecting to Google Groups,
you explained me that 2 nodes cluster is not a cluster. So I went to 3
nodes.
The indexing process now goes smoothly and I'm satisfied by it.
The maximum capacity = the size of the data? The data is spreaded
equally, each has ~500Gb. The shards are located not equally of course,
because it' shard to split 5 shards between servers.

if your requirements allow that according to the data patterns and the
search load, but not with the ES OOTB settings

what is the ES OOTB?

The main target of our cluster is to save all the logs from our internal
aplications and then allow us to search through them, and, do some
analytics using Kibana.
The search load currently is something about 0, because as soon I'm trying
to search it works quite slow, and when I'm trying to aggregate the values

I even have my cluster down.

What is your view on this issue, Jörg, should we go to the 10 small
servers, rather then 3 big ones?

Regards,

On Friday, September 12, 2014 2:43:15 PM UTC+3, Jörg Prante wrote:

Regarding the shards, if you have 3 nodes and 1 index, with 5 shards you
have a sort of "impedance mismatch" because 5 (or 10 with replica) shards
do not distribute equally on 3 nodes.

Rule: use a shard count that is always a factor of the node, e.g. 3, 6, 9,
12 .... for 3 nodes.

Can you tell what the maximum capacity of a single node is for your
installation? Somehow you must have concluded that 3 nodes are sufficient -
how did you do that? It does not only depend on observing index size. You
can even run 1.5TB index on a single node, if your requirements allow that
according to the data patterns and the search load, but not with the ES
OOTB settings, which is for development installations.

Also note that Kibana is great but I have the impression (I do not use it)
that many queries from the UI are not optimized regarding filter caches and
tend to waste resources. There is much space left for improvement.

Jörg

On Fri, Sep 12, 2014 at 12:26 PM, Mark Walkom <ma...@campaignmonitor.com
<javascript:>> wrote:

As I initially mentioned, it all depends on your use case but generally
ES does scale better horizontally rather than vertically. If you can, spin
up another cluster along side the one you have and then replica the data
set and query usage and compare the performance.

Ideally you should aim for one primary shard per node but you can over
allocate if you expect to grow - ie create 6 shards if you expect to grow
to 6 servers. This applies on larger clusters as well, to a point.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 12 September 2014 19:24, Pavel P <pa...@kredito.de <javascript:>>
wrote:

Do you say, that 10 servers like 2 CPU, 7.5 RAM (so totally 20 CPUs and
75Gb RAM) cluster would be more powerful then the 3 serves of 8 CPU and 30
RAM (in total 24 CPU and 90RAM) ?
Assuming that the information would be spread there equally.

btw, what about the shards allocation. Currently I use the default one 5
shards and 1 replica. Could this be a potential thing to optimisation?
How the shards scheme should look on the cluster with the bigger number
of the nodes?

Regards,

On Friday, September 12, 2014 12:11:32 PM UTC+3, Mark Walkom wrote:

The answer is it depends on what sort of use case you have.
But if you are experiencing problems like you are then usually it's due
to the cluster being at capacity and needing more resources.

You may find it cheaper to move to more numerous and smaller nodes that
you can distribute the load across, as that is where ES excels and also how
many other big data platforms operate.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 12 September 2014 19:01, Pavel P pa...@kredito.de wrote:

Java version is "1.7.0_55"
Elasticsearch is 1.3.1

Well, the cost of the whole setup is the question.
currently it's something about 1000$ per month on AWS. Do we really
need to pay a lot more then 1000$/month to support the 1.5Tb data?

Could you briefly describe how much nodes do you expect to handle that
much of data?

The side question is, how the the really Big Data solution works, when
they do the search or aggregation from the data which size is far more then
1.5Tb? Or it's as well is the size of the architecture.

Regards,

On Friday, September 12, 2014 11:53:35 AM UTC+3, Mark Walkom wrote:

That's a lot of data for 3 nodes!
You really need to adjust your infrastructure; add more nodes, more
ram, or alternatively remove some old indexes (delete or close).

What ES and java version are you running?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 12 September 2014 18:48, Pavel P pa...@kredito.de wrote:

Hi,

Again I have an issue with the power of the cluster.

I have the cluster from 3 servers, each has 30RAM, 8 CPUs and 1Tb
disk attached.

https://lh4.googleusercontent.com/-W1AVatn9Cq0/VBKzYgR3QKI/AAAAAAAAAJc/S3TWMBqqqX0/s1600/ES_cluster.png

There are 1323957069 docs (1.64TB) there, the documents
distribution is the next:

https://lh5.googleusercontent.com/-kjlQG7xBfIw/VBKwCt8sKQI/AAAAAAAAAJQ/s8kuqouFUkQ/s1600/Screen%2BShot%2B2014-09-12%2Bat%2B11.33.49%2BAM.png

All the 3 nodes are data nodes.

The index throughput is something about 10-20k documents per minute.
(it's the logstash -> elasticsearch setup, we store different logs in the
cluster)

My concerns are the next:

When I load the index page of kibana - the loading of the
document types panel takes about a minute. It that ok?

For the document type user_account, when I try to build the terms
panel for the field "message.raw" (the string of 20-30 characters). My
cluster stucks.
In the logs I can find the next

[2014-09-11 08:03:34,507][ERROR][indices.fielddata.breaker]

[morbius] New used memory 6499531395 [6gb] from field [message.raw] would
be larger than configured breaker: 6414558822 [5.9gb], breaking

But, despite of the breakers, when it tries to calculate that terms
pie, it stops indexing the input documents. The queue goes up. Then, it
happens that I see the heap exceptions and to solve them the only thing I
could do is to reboot the cluster.

My question is the next:

It looks like I have quite powerful servers and the correct
configuration (my ES_HEAP_SIZE is set to 15g), while they are still
not able to process the 1.5Tb of information or doing that quite slowly.
Do you have any advice of how to overcome that and make my cluster
to response more fast? How should I adjust the infrastructure?

Which hardware should I need to manipulate the 1.5Tb in the
reasonable amount of time?

Any thoughts are welcome.

Regards,

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/707ed8a1-8f9
4-48cc-a78a-0e1f63f32b8d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/707ed8a1-8f94-48cc-a78a-0e1f63f32b8d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/3aa93dc1-1c75-4b75-b864-8b391ec218c6%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3aa93dc1-1c75-4b75-b864-8b391ec218c6%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5464fa68-1d50-47f7-889d-08952501517f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5464fa68-1d50-47f7-889d-08952501517f%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624YzauK%2B-Fe76KXBHuOUbgmESFAxbST8FnuBfyuJcYYFUw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEM624YzauK%2B-Fe76KXBHuOUbgmESFAxbST8FnuBfyuJcYYFUw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c029f478-230d-4bb2-bd13-83bfa5a3a39f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · September 12, 2014, 6:25pm

Yes, 2 servers are not enough from a fault tolerance perspective.

It is hard to find out why your ES cluster runs slow without information.
Maybe a few settings changes is all you need, I do not know. Maybe you can
find out from the logs what to do.

For sizing an ELK stack, there are many hints on the net, the best are
available from the company.

I hesitate to recommend anything on AWS service or elsewhere. Personally I
am in the situation that I use bare metal server in my own data center with
the specifications I want.

In the end, it is up to you to decide if you go the path "many servers with
less power" or "few servers with much power". This might also not be a
technical issue but also a strategic question.

As Mark said, Elasticsearch was designed to scale out. This means, you can
add servers very easily, and this improves the capacity and power of the
overall system. For many it is enough to add nodes and see the problems go
away, without thinking hard about the reasons.

Jörg

On Fri, Sep 12, 2014 at 1:51 PM, Pavel P pavel@kredito.de wrote:

2Jörg

How I decided that 3 is enough.
I've started from 2 nodes in the cluster. And it was not able to manage
the index load.
Then, in this conversation,
Redirecting to Google Groups, you
explained me that 2 nodes cluster is not a cluster. So I went to 3 nodes.
The indexing process now goes smoothly and I'm satisfied by it.

The maximum capacity = the size of the data? The data is spreaded
equally, each has ~500Gb. The shards are located not equally of course,
because it' shard to split 5 shards between servers.

if your requirements allow that according to the data patterns and the
search load, but not with the ES OOTB settings

what is the ES OOTB?

The main target of our cluster is to save all the logs from our internal
aplications and then allow us to search through them, and, do some
analytics using Kibana.
The search load currently is something about 0, because as soon I'm trying
to search it works quite slow, and when I'm trying to aggregate the values

I even have my cluster down.

What is your view on this issue, Jörg, should we go to the 10 small
servers, rather then 3 big ones?

Regards,

On Friday, September 12, 2014 2:43:15 PM UTC+3, Jörg Prante wrote:

Regarding the shards, if you have 3 nodes and 1 index, with 5 shards you
have a sort of "impedance mismatch" because 5 (or 10 with replica) shards
do not distribute equally on 3 nodes.

Rule: use a shard count that is always a factor of the node, e.g. 3, 6,
9, 12 .... for 3 nodes.

Can you tell what the maximum capacity of a single node is for your
installation? Somehow you must have concluded that 3 nodes are sufficient -
how did you do that? It does not only depend on observing index size. You
can even run 1.5TB index on a single node, if your requirements allow that
according to the data patterns and the search load, but not with the ES
OOTB settings, which is for development installations.

Also note that Kibana is great but I have the impression (I do not use
it) that many queries from the UI are not optimized regarding filter caches
and tend to waste resources. There is much space left for improvement.

Jörg

On Fri, Sep 12, 2014 at 12:26 PM, Mark Walkom ma...@campaignmonitor.com
wrote:

As I initially mentioned, it all depends on your use case but generally
ES does scale better horizontally rather than vertically. If you can, spin
up another cluster along side the one you have and then replica the data
set and query usage and compare the performance.

Ideally you should aim for one primary shard per node but you can over
allocate if you expect to grow - ie create 6 shards if you expect to grow
to 6 servers. This applies on larger clusters as well, to a point.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 12 September 2014 19:24, Pavel P pa...@kredito.de wrote:

Do you say, that 10 servers like 2 CPU, 7.5 RAM (so totally 20 CPUs and
75Gb RAM) cluster would be more powerful then the 3 serves of 8 CPU and 30
RAM (in total 24 CPU and 90RAM) ?
Assuming that the information would be spread there equally.

btw, what about the shards allocation. Currently I use the default one
5 shards and 1 replica. Could this be a potential thing to optimisation?
How the shards scheme should look on the cluster with the bigger number
of the nodes?

Regards,

On Friday, September 12, 2014 12:11:32 PM UTC+3, Mark Walkom wrote:

The answer is it depends on what sort of use case you have.
But if you are experiencing problems like you are then usually it's
due to the cluster being at capacity and needing more resources.

You may find it cheaper to move to more numerous and smaller nodes
that you can distribute the load across, as that is where ES excels and
also how many other big data platforms operate.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 12 September 2014 19:01, Pavel P pa...@kredito.de wrote:

Java version is "1.7.0_55"
Elasticsearch is 1.3.1

Well, the cost of the whole setup is the question.
currently it's something about 1000$ per month on AWS. Do we really
need to pay a lot more then 1000$/month to support the 1.5Tb data?

Could you briefly describe how much nodes do you expect to handle
that much of data?

The side question is, how the the really Big Data solution works,
when they do the search or aggregation from the data which size is far more
then 1.5Tb? Or it's as well is the size of the architecture.

Regards,

On Friday, September 12, 2014 11:53:35 AM UTC+3, Mark Walkom wrote:

That's a lot of data for 3 nodes!
You really need to adjust your infrastructure; add more nodes, more
ram, or alternatively remove some old indexes (delete or close).

What ES and java version are you running?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 12 September 2014 18:48, Pavel P pa...@kredito.de wrote:

Hi,

Again I have an issue with the power of the cluster.

I have the cluster from 3 servers, each has 30RAM, 8 CPUs and 1Tb
disk attached.

https://lh4.googleusercontent.com/-W1AVatn9Cq0/VBKzYgR3QKI/AAAAAAAAAJc/S3TWMBqqqX0/s1600/ES_cluster.png

There are 1323957069 docs (1.64TB) there, the documents
distribution is the next:

https://lh5.googleusercontent.com/-kjlQG7xBfIw/VBKwCt8sKQI/AAAAAAAAAJQ/s8kuqouFUkQ/s1600/Screen%2BShot%2B2014-09-12%2Bat%2B11.33.49%2BAM.png

All the 3 nodes are data nodes.

The index throughput is something about 10-20k documents per
minute. (it's the logstash -> elasticsearch setup, we store different logs
in the cluster)

My concerns are the next:

When I load the index page of kibana - the loading of the
document types panel takes about a minute. It that ok?

For the document type user_account, when I try to build the
terms panel for the field "message.raw" (the string of 20-30 characters).
My cluster stucks.
In the logs I can find the next

[2014-09-11 08:03:34,507][ERROR][indices.fielddata.breaker]

[morbius] New used memory 6499531395 [6gb] from field [message.raw] would
be larger than configured breaker: 6414558822 [5.9gb], breaking

But, despite of the breakers, when it tries to calculate that terms
pie, it stops indexing the input documents. The queue goes up. Then, it
happens that I see the heap exceptions and to solve them the only thing I
could do is to reboot the cluster.

My question is the next:

It looks like I have quite powerful servers and the correct
configuration (my ES_HEAP_SIZE is set to 15g), while they are
still not able to process the 1.5Tb of information or doing that quite
slowly.
Do you have any advice of how to overcome that and make my cluster
to response more fast? How should I adjust the infrastructure?

Which hardware should I need to manipulate the 1.5Tb in the
reasonable amount of time?

Any thoughts are welcome.

Regards,

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/707ed8a1-8f9
4-48cc-a78a-0e1f63f32b8d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/707ed8a1-8f94-48cc-a78a-0e1f63f32b8d%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/3aa93dc1-1c75-4b75-b864-8b391ec218c6%40goo
glegroups.com
https://groups.google.com/d/msgid/elasticsearch/3aa93dc1-1c75-4b75-b864-8b391ec218c6%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/5464fa68-1d50-47f7-889d-08952501517f%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5464fa68-1d50-47f7-889d-08952501517f%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/CAEM624YzauK%2B-Fe76KXBHuOUbgmESFAxbST8FnuBfyu
JcYYFUw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEM624YzauK%2B-Fe76KXBHuOUbgmESFAxbST8FnuBfyuJcYYFUw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/c029f478-230d-4bb2-bd13-83bfa5a3a39f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/c029f478-230d-4bb2-bd13-83bfa5a3a39f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGd%3DRmRUO68hbOQ%2BEmObzsTEoPDFEkPwmDOEwQT5gFO-Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Greg_Murnane · September 17, 2014, 4:22pm

I run 1.3TB of active indices on a single node (64 GB ram with 12GB heap
size, and 15 small disks in a raid 5), with most of my messages quite
small, which makes it looks similar to your case in volume, although I have
a significantly lower (about 5K) indexing rate.

I suspect that the single disk on each node may be limiting your capacity
here. Try measuring some metrics of disk utilization (read and write rates,
latency, and queue lengths). If those all look fine while you're loading
the dashboard, then I'm incorrect. If they look like the disk is being
thrashed, consider moving to more smaller nodes, or adding additional disks
to these nodes to boost the IO rates.

--
The information transmitted in this email is intended only for the
person(s) or entity to which it is addressed and may contain confidential
and/or privileged material. Any review, retransmission, dissemination or
other use of, or taking of any action in reliance upon, this information by
persons or entities other than the intended recipient is prohibited. If you
received this email in error, please contact the sender and permanently
delete the email from any computer.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cc91b9f2-72b7-455d-b2d3-a1018d9ce919%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Pavel_P · September 17, 2014, 4:25pm

Thanks Greg,

Your example should be useful.

среда, 17 сентября 2014 г. пользователь Greg Murnane написал:

I run 1.3TB of active indices on a single node (64 GB ram with 12GB heap
size, and 15 small disks in a raid 5), with most of my messages quite
small, which makes it looks similar to your case in volume, although I have
a significantly lower (about 5K) indexing rate.

I suspect that the single disk on each node may be limiting your capacity
here. Try measuring some metrics of disk utilization (read and write rates,
latency, and queue lengths). If those all look fine while you're loading
the dashboard, then I'm incorrect. If they look like the disk is being
thrashed, consider moving to more smaller nodes, or adding additional disks
to these nodes to boost the IO rates.

The information transmitted in this email is intended only for the
person(s) or entity to which it is addressed and may contain confidential
and/or privileged material. Any review, retransmission, dissemination or
other use of, or taking of any action in reliance upon, this information by
persons or entities other than the intended recipient is prohibited. If you
received this email in error, please contact the sender and permanently
delete the email from any computer.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/4eP-aR0KpBA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com
<javascript:_e(%7B%7D,'cvml','elasticsearch%2Bunsubscribe@googlegroups.com');>
.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cc91b9f2-72b7-455d-b2d3-a1018d9ce919%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/cc91b9f2-72b7-455d-b2d3-a1018d9ce919%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--

Pavel Polyakov

Software Engineer - PHP

E-mail: pavel@kredito.de
Skype: pavel.polyakov.x1

https://www.facebook.com/kreditech
Kreditech Holding SSL GmbH
Am Sandtorkai 50, 20457 Hamburg, Germany
Office phone: +49 (0)40 - 605905-60
Authorized representatives: Sebastian Diemer, Alexander Graubner-Müller
Company registration: Hamburg HRB122027

www.kreditech.com
Kreditech https://www.facebook.com/kreditech

https://www.facebook.com/kreditech

This e-mail contains confidential and/or legally protected information. If
you are not the intended recipient or if you have received this e-mail by
error please notify the sender immediately and destroy this e-mail. Any
unauthorized review, copying, disclosure or distribution of the material in
this e-mail is strictly forbidden. The contents of this e-mail is legally
binding only if it is confirmed by letter or fax. The sending of e-mails to
us does not have any period-protecting effect. Thank you for your
cooperation.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAFVUaqPyh8C7QQW0n4M-1jPuyQn_vHXpb64%3DaZYhOgj%2BbD0gXw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

otisg · September 18, 2014, 3:01am

Hi Pavel,

When you open Kibana and things are slow, what's happening with your server?
Is/are the CPUs maxed out for a minute?
Do you see heavy disk IO?
Swapping?
You can use our SPM http://sematext.com/spm/ to see all this and various
other ES metrics. Show/tell us what you see and people will be able to
provide more specific advice.

How big of a time period do you view in Kibana when this happens?
Do you have time-based indices?

Otis

Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

On Wednesday, September 17, 2014 12:25:48 PM UTC-4, Pavel P wrote:

Thanks Greg,

Your example should be useful.

среда, 17 сентября 2014 г. пользователь Greg Murnane написал:

I run 1.3TB of active indices on a single node (64 GB ram with 12GB heap
size, and 15 small disks in a raid 5), with most of my messages quite
small, which makes it looks similar to your case in volume, although I have
a significantly lower (about 5K) indexing rate.

I suspect that the single disk on each node may be limiting your capacity
here. Try measuring some metrics of disk utilization (read and write rates,
latency, and queue lengths). If those all look fine while you're loading
the dashboard, then I'm incorrect. If they look like the disk is being
thrashed, consider moving to more smaller nodes, or adding additional disks
to these nodes to boost the IO rates.

The information transmitted in this email is intended only for the
person(s) or entity to which it is addressed and may contain confidential
and/or privileged material. Any review, retransmission, dissemination or
other use of, or taking of any action in reliance upon, this information by
persons or entities other than the intended recipient is prohibited. If you
received this email in error, please contact the sender and permanently
delete the email from any computer.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/4eP-aR0KpBA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cc91b9f2-72b7-455d-b2d3-a1018d9ce919%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/cc91b9f2-72b7-455d-b2d3-a1018d9ce919%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--

Pavel Polyakov

Software Engineer - PHP

E-mail: pa...@kredito.de <javascript:>
Skype: pavel.polyakov.x1

https://www.facebook.com/kreditech
Kreditech Holding SSL GmbH
Am Sandtorkai 50, 20457 Hamburg, Germany
Office phone: +49 (0)40 - 605905-60
Authorized representatives: Sebastian Diemer, Alexander Graubner-Müller
Company registration: Hamburg HRB122027

www.kreditech.com
Kreditech https://www.facebook.com/kreditech

https://www.facebook.com/kreditech

This e-mail contains confidential and/or legally protected information. If
you are not the intended recipient or if you have received this e-mail by
error please notify the sender immediately and destroy this e-mail. Any
unauthorized review, copying, disclosure or distribution of the material in
this e-mail is strictly forbidden. The contents of this e-mail is legally
binding only if it is confirmed by letter or fax. The sending of e-mails to
us does not have any period-protecting effect. Thank you for your
cooperation.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c84aa612-a6e8-499d-ba28-d5ed9be19a26%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Cluster resource usage Elasticsearch	14	432	July 6, 2017
High load on nodes every time a certain query executes Elasticsearch	3	629	July 5, 2017
Cluster CPU usage Elasticsearch	1	314	July 6, 2017
Redesigning ES Cluster, questions about optimization Elasticsearch	4	340	July 6, 2017
Cluster (ES 5.2) performance degrading after indexing Elasticsearch	3	508	June 6, 2017

Powerful cluster is not able to handle 1.5Tb of data, how to optimize?

Otis

Related topics